Abstract
In previous studies, full-text analyses and mining techniques have not been combined to identify and trace changes in the knowledge trends of library science over the past 20 years (1997–2016). Thus, to grasp the knowledge trends of library science at a fine-grained level, this study analyzes full-text journal articles from six top-ranked library science journals by applying text-mining techniques such as co-word analysis, text summarization, and topic modeling. Visualization tools were used to map the knowledge structure of library science. The findings indicate that, during the past 20 years, library science has developed into an interdisciplinary knowledge structure that integrates librarianship topics with a range of other fields, generating major topics that include the academic library, the digital library, research methodology, library marketing, information retrieval, digital information, document citation, and so on. In the past ten years, the library science discipline has focused increasingly on research methodology and evaluation and become more concerned with digital information management.
Keywords
Introduction
Scholars assess the progress of knowledge in their fields and formally publish their findings as explicit knowledge in the literature of their respective academic domains. Analyses of specific fields’ published literature are crucial because they shed light on the topic trends or dynamics thereof. Journals play important roles in scholarly communication by identifying emerging concepts in disciplines and exposing complications and research frameworks; hence, an assessment of periodical literature may provide a representation of the discipline and the profession (Davarpanah and Aslekia, 2008).
Library science (LS) is an academic field that applies the theory and practice of information technology, education, and management to libraries. It studies the collecting, cataloging, and preserving of library resources. In recent years, this discipline has undergone a dramatic evolution due to the rapid development of information technology and changes in human behavior, law management, and science (Aharony, 2012). The growth of cross-disciplinary fields under the umbrella of library science and the development of new knowledge areas reflect this evolution.
Research trends in LS have been studied from various perspectives, including public library trends (Clifford, 2003) social bookmarking in academic libraries (Redden, 2010), research data management services in libraries (Corrall et al., 2013), library sector functions (Macevičiūtė, 2014), and trends in library networks (Zverevich, 2014). However, scholars have not studied the knowledge structure of LS.
On the other hand, some previous library and information science (LIS) and information science (IS) studies have analyzed library science as a distinct sub-field. Hjørland (2000) conducted a content analysis to present different characteristics of LIS. Åström (2002) adopted a co-citation analysis between keywords and citations to visualize LIS. Janssens et al. (2006) analyzed a full-text using latent semantics to map LIS. Zins (2007) employed the Critical Delphi qualitative research methodology to moderate a discussion of the classification schemes of IS. Milojević et al. (2011) analyzed article titles to understand the LIS knowledge. Singh and Chander (2014) conducted a bibliometric analysis to examine the trends of the scholarly journal Library Management. And Tuomaala et al. (2014) classified the topics of publications to investigate the evolution of LIS from 1965 to 2005.
Problem statement
Scientific knowledge has evolved over time through scholarly communication and LS is among the disciplines that has progressively evolved. An understanding of the state-of-the-art in LS is of paramount importance since LS closely interacts with IS and other disciplines and what and how the field of LS has been shaped up recently by interacting with other related disciplines. Thus, investigating the LS knowledge structure as well as LS trends will expand scholarly understanding of societal changes and LS’s interaction with other disciplines. Although previous studies have mentioned and conducted LS topics in a sub-field of LIS and/or IS, they have not covered all aspects of LS, which focuses on studying the principles and practices of library; LIS instead focuses on studying the process of information storage and retrieval. Moreover, while some studies have examined subjects in LIS journals, others have focused on the geographical or national characteristics of LIS-related research more than they have emphasized its knowledge structure.
Most have adopted bibliometric and content analysis methods to analyze article titles, keywords, abstracts, and authors in the hope of describing and defining the fields of LIS, IS, and LS. Previous studies have also been limited to analyzing document surrogates like keywords and abstracts; because they condense sentences from the source text, such extracts may fail to fully reflect article content.
Therefore, to get a complete result of the knowledge structure of LS, this study explores a unique approach to examine full-text articles by using text-mining techniques. An automatic text summarization technique was employed to extract interesting sentences from source documents and then consolidate them to produce summaries. Furthermore, a topic modeling technique was applied to help determine the knowledge structure of the field of library science.
This study’s goals were twofold. First, to identify the knowledge structure of LS. Second, to map the changes in knowledge trends over the past 20 years (1997–2016) by analyzing full-text journal articles published in online databases during this time period. Because this study intended to construct a picture of the LS knowledge structure of the last 20 years, the data from a collection of full-text articles from various databases and LS journal were derived. The text-mining techniques of automatic text summarization and topic modeling were applied to analyze this large dataset, and a visualization tool was used to map the LS knowledge structure. Even though several previous studies have utilized text-mining techniques, none of them have applied summarization techniques that are useful for condensing large amounts of data while preserving source content and overall meaning. Milojević et al. (2011) used word frequency to conduct content analysis that aimed to identify LIS research topics. Van Den Besselaar and Heimeriks (2006) applied word-reference co-occurrences to title words of articles to determine the research topics in LIS journals.
Literature review
Most previous studies have provided a diversity of approaches to investigate IS structures and/or LIS content, whereas the knowledge structure of international library science has been studied only via its sub-fields. For example, Haiqi (1995) presented a bibliometric survey of research articles published in three periodicals of medical librarianship—Bulletin of the Medical Library Association (BMLA), Journal of the Japan Medical Library Association (JJMLA), and Medical Information Services (MIS)—from 1990 to 1992. The study analyzed a total of 410 articles and adapted subjects using a subject classification method described in both Feehan et al. (1987) and Dimitroff (1992). It found that information dissemination and information retrieval were the most common areas of inquiry.
Blessinger and Frasier (2007) analyzed publication and citation trends in LIS journals from 1994 to 2004, examining 2220 articles from 10 journals. The top five categories included library operations, research among library and information science/users, library/information science professions, technology, and publishing studies, respectively. Chang and Huang (2012) used bibliometric methods consisting of a direct citation, bibliographic, and co-authorship analysis to consider interdisciplinary changes in LIS from 1978 to 2007. The study analyzed 1536 articles from 10 LIS journals and found that LIS articles were widely distributed across 30 subject areas which include computer science, education, sociology management, and general science. Jabeen et al. (2015) studied 18,371 research articles published between 2003 and 2012 in 40 LIS core journals on the Web of Science database, using bibliometric analyses to determine the evolution rates and trends in global publications of LIS. The keyword analysis identified ‘Internet,’ ‘information retrieval,’ ‘digital libraries,’ ‘World Wide Web’, and ‘information’ as the top keywords. From 2003 to 2012, LIS professionals produced research on IT theories and applications. Examining research articles published between 1996 and 2005 in 12 IS and LS journals using Web of Science, Åström (2010) demonstrated a bibliometric mapping of the LIS field to visible IS and LS research. His analysis of LIS journal citation data indicated that the field of LIS encompasses both IS and LS, with IS and LS being the two main subfields.
Some studies have concentrated on content analysis. Hjørland (2000) identified a characteristic of LIS from theoretical and philosophical perspectives by analyzing the content of LIS. He explained the associations between LIS and the fields of LS, IS, and documentation. Scholars have classified library automation, digital libraries, user studies, and library history as sub-areas of LIS. Koufogiannakis et al. (2004) performed a content analysis of 2664 LIS articles published in 2001 from 217 LIS journals and tested the domains using Crumley and Koufogiannakis’ (2002) taxonomy. The highest number of research articles concern information access and retrieval, followed by collections, management, education, and reference, respectively. The study added the domain of professional issues to Crumley and Koufogiannakis’ taxonomy and deleted the marketing and promotion domains. Applying content analysis to the keywords and abstracts of 1250 articles, Aharony (2012) examined the top 10 LIS research journals between 2007 and 2008 to identify LIS research areas using Zins’ (2007) IS classification scheme. The analysis identified the three core areas as information technology, methodology, and social information science.
Sethi and Panda (2012) surveyed the scholarly journal articles to find publication trends in two core LIS journals indexed in the ScienceDirect database: International Information and Library Review (IILR), and Library and Information Science Research (LISR). The study analyzed the content of 1000 research papers published during the 2000–2010 period to gauge the growth of LIS research, authorship, etc., using Subramanyam’s (1983) formula. The analysis showed that both journals focus on issues related to the ICT age, including “digital library” and “information seeking behavior.” Singh and Chander (2014) explored publication trends in Emerald’s Library Management journal by analyzing 336 articles published between 2006 and 2012. Their analysis suggested that LIS research associated with various domains such as library and knowledge management, information services, and professional development. To map research article distribution by topic, Tuomaala et al. (2014) analyzed LIS research articles published in core LIS journals in 2005 and examined the growth of LIS from 1965 to 2005 by comparing the 1965, 1985, and 2005 datasets. The result showed that in 2005, information retrieval, scientific communication, library services, and information seeking were the large areas of LIS research.
Further studies have used title words to analyze content. Van Den Besselaar and Heimeriks (2006) developed an IS knowledge map from an analysis of IS research topics in eight journals around JASIST between 1986 and 2002. They identified information searching, catalogs and indexes, information seeking, libraries, digital libraries, and (digital) library services as subfields of information science. Milojević et al. (2011) analyzed the titles of articles in 16 LIS journals which were published between 1988 and 2007. They reported that the knowledge structure of LIS that is primarily emphasized comprises libraries, information, science, and information-seeking behavior. As a branch of LIS, LS encompasses the work of academic and public libraries, information literacy, information systems, policy, web sites, knowledge management, digital libraries, e-business and law, and scholarly publishing. Zins (2007) argued that a field of LS was dedicated to classification schemes of IS as a subfield, such as librarianship, indexing, abstracting, digital library, library system, and information-seeking behavior.
Most studies have focused on LS as a sub-field of LIS. Previous analyses indicate that, between 1997 and 2006, researchers focused on information retrieval, library operations, research in LIS, the LIS profession, publishing studies, library automation, library history, user studies, collections, reference, library and information services, catalogs and indexes, information seeking, academic libraries, public libraries, information literacy, policy, and websites. From 2007 to 2012, on the contrary, scholars primarily examined information technology, methodology and social information science, digital libraries, information-seeking behavior, library management, knowledge management, information services, human resource management, and professional development.
Methodology
This study collects data from full-text LS articles published between 1997 and 2016 to determine how LS has been structured during the past 20 years and identify changes in and between its sub-disciplines.
Data collection
To investigate the knowledge structure of LS over the past 20 years, we set out to analyze the core LS journals published from 1997 to 2016. Given the importance of journal selection, this study used the SJR 2016 (Scimago Journal Ranking) in the field of LIS to select journals. The SJR is an indicator that measures the average prestige per paper of journals and it is erected on the Scopus database that contains a wide range of scientific source citations. Moreover, the SJR indicator uses a citation window of three years, which is effective for measuring citations and the evolution of scientific journals (González-Pereira et al., 2010). We selected the journals that emphasize “Library” or “Librarianship” in their titles and considered their scope. Based on these criteria, we assumed that these journals represent the output of the LS discipline. In order to conduct comprehensive analyses of the articles’ content, we constructed our dataset using full-text articles instead of article abstracts. Therefore, we selected the following six LS journals for this study: Journal of Academic Librarianship, Library Quarterly, Library and Information Science Research, Journal of Librarianship and Information Science, Library Management, and International Journal on Digital Libraries. We included the two LIS journals (LISR and JLIS) in the dataset because we wanted to observe how trends that fused LS and IS emerged. Scholars generally regard these two disciplines as closely interacting with one another.
These journals are peer-reviewed journals that appear in Q1 on the journal ranking list (the list includes quartile rankings for each journal based on its subject categories; Q1 is the first quartile, denoting a top 25% ranking in terms of impact factor distribution). Consequently, to develop a data sample of leading journals that would enable us to identify representative LS trends, we focused on Q1 journals.
This section presents the overall approach used for discerning the knowledge structure of LS. The full-text articles were analyzed by the automatic text summarization technique, which extracts and summarizes the most important content of an article. Next, a topic-modeling technique was applied to generate the topics and terms. In addition, N-Gram and visualization methods were utilized to represent the LS domains. Figure 1 displays this study’s framework, which consisted of the steps listed below:
the gathering of full-text library science articles from online databases;
the summarization of a set of LS articles as the input; the application of pre-processing, including sentence segmentation and stop word removal;
the employment of automatic text summarization to generate sentence summaries and keywords for each article;
the application of topic modeling technique to identify topics and terms in LS summary documents;
the utilization of N-gram and visualization tools to map the structure of LS knowledge;
the analysis of the results.

This study’s research framework.
For text processing and analysis, this study used the open source toolkit ‘yTextMiner’ (http://informatics.yonsei.ac.kr:8080/yTextMiner/home.html), a text mining tool developed in JAVA. In addition, Gephi (https://gephi.org/) and VOSviewer (http://www.vosviewer.com/) were applied to visualize LS topical topology.
Pre-processing
An essential part of text mining, pre-processing is the first step in analyzing datasets; it allows researchers to derive quality data before running specific analyses.
In this study, the pre-process text-mining technique was applied to source text documents; this technique involves sentence segmentation—splitting the sentences based on a full-stop delimiter and stop word removal to eliminate words (noise) that do not carry information such as “a,” “is,” “are,” and etc. Moreover, Part-of-Speech tagging (POS) was used to classify the words in the sentences as nouns, verbs, adverbs, adjectives, etc. This technique is useful for parsing complex words and accurately identifying the meaning of a word in a sentence. A lemmatization in pre-processing was also used to analyze the lemma of each word. This reduces inflection and derivationally related forms of words and replaces suffixes to determine the common base forms of the words (Manning et al., 2009; Song et al., 2015).
Automatic text summarization for full-text library science articles
The automatic text summarization system provides advantages over manually summarizing large quantities of data. The system can abbreviate information from various documents into concise, readable summaries (Harabagiu and Lacatusu, 2010). It summarizes documents into important sentences (selected from the documents), thereby producing summaries that should convey the documents’ most relevant information in a more concise form than the originals (Gambhir and Gupta, 2017). Huang et al. (2010) identified the four main points of automatic text summarization as to cover, weight, decrease and cohere the text.
An extractive text summarization technique was used to identify interesting sentences in the source documents and consolidate them to produce sentence and keyword summaries for each article. To this end, this study employed LexRank (Erkan and Radev, 2004) based on graph-based centrality scoring of sentences for text summarization. This technique calculates the relative importance of natural language processing (NLP) of sentences in a text. It is developed from the theory of eigenvector centrality in the graph representation of sentences. A connectivity matrix is applied to the method as the adjacency matrix of the graph representation of the sentences.
After the data pre-processing, texts were fed into the summarization algorithm and the constructed summaries from the returned sentence indices and the keywords extracted from the text string it generated. To split texts into a list of tokenized sentences, a document preprocessor class was used from the yTextMiner library and preprocessing tokenized sentences, aided by a stop word list, read from the files.
Topic modeling for library science scholarship
Topic modeling is a probabilistic generative model that is based on an assumption that documents contain a mixture of topics which consist of the highest probability of each word. This statistical model is used to identify abstract topics that occur in collections of literatures. A topic comprises a cluster of words that frequently occur together. Researchers often use topic modeling as a text-mining tool to discover hidden semantic structures in document content (Steyvers and Griffiths, 2007).
To discover the hidden topics and terms in full-text LS articles, Latent Dirichlet Allocation (LDA) is used for topic modeling. This approach enables a process to identify the themes that run through the articles, determine how those themes are connected, and summarize electronic literature at a scale, which is impossible by manual annotation. LDA is a statistical topic model and a version of probabilistic modeling that automatically discovers and explores topics from large collections of documents by observing documents to predicate hidden topic structures. In document analyses, this statistical model tries to capture and reflect the intuition of LDA that the documents show multiple topics. Through this generative process, the topics are described randomly, distributed over the fixed vocabulary of the collection, and specified before any data has been generated (Blei, 2012).
To observe LS trends, a directional similarity-based topic matching was adopted to match a topic in a period group with the closest topic in another period group. The 20-year collection period was divided into two periods of 10 years. The topic matching technique allowed us to identify not only topic rank change but also topic integration and division between the two periods.
Visualization of the knowledge structure of library science
N-gram based visualization by Gephi and VOS visualization tools were used to map the LS knowledge structure. A betweenness centrality value of terms was computed to identify the important LS terms. Betweenness centrality (Lu and Zhang, 2013) refers to the degree of centrality in a graph based on the shortest path between each pair of nodes. This measure determines which nodes serve as bridges in the network.
Findings
From the six aforementioned LS journals, 4023 full-text journal articles in PDF files published between 1997 and 2016 were downloaded, excluding other types of publications such as book reviews and editorials due to the irrelevance of their content. The available full-text articles via the online database that were accessible and downloadable are included in Table 1.
List of full text articles.
Full-text library science articles summarization
As the summarization process in Figure 2 shows, the summarization system created 4023 summaries from a set of documents and displayed output files in text documents like the one shown in Figure 3. All the summaries were based on the keywords lists which were generated from the extracted words, totaling 146,379 words for the 20-year time period (1997–2016): 180,479 words for the first 10-year- time period (1997–2006) and 223,949 words for the second 10-year time period (2007–2016).

This study’s summarization process.

A sample document summary.
Library science scholarship topics
In this way, topical terms from LDA topic modeling with summarized texts were generated. This process formulated an output of 20 LS topics with their terms, phrase collections, weights, and count numbers as shown in Appendix A. In addition, the unclear words that appeared in the topics were corrected and removed, thus the selected words are more thematically focused. The words that occurred most frequently, such as ‘library,’ ‘information,’ and ’management,’ had little independent cognitive value, but they confirmed concepts when occurring alongside terms in the same cluster and as parts of multi-word phrases. Hence, these phrases in the same clusters of words were also selected to consider and conclude topics. Subsequently, each topic was named by considering the frequency of words and the phrase weights.
The analysis result displayed the thematic structure of LS, as is shown in Table 2. Since LDA was used to generate the 20 topics, these topics are described by using the terms and phrase weight analyses of each topic.
LS topics in last 20 years (1997–2016).
The weight of a topic, represented by α, prior weight of topic k in a document. It is initially the same for all topics, but once a topic is learned, it is assigned its unique weight.
The above 20 topics were defined as follows. Topic 1, Information Literacy, is understood as a basic skill for lifelong learning that involves recognizing when information is needed, knowing how to locate information, understanding the information processing and using the valued information to create new knowledge. Topic 2, Information Retrieval, is a form of information processing related to information needs and information-searching systems; it supports the gathering of information from large quantities of data in response to user queries. Topic 3, Information Seeking, encompasses the process of searching, retrieving, recognizing, and using the content.
Topic 4, Cataloging and Metadata, refers to library cataloging, Metadata, RDA, FRBR, MARC21, AACR2, BIBFRAM, and Dublin Core—processes of creating datasets to describe and relay information about data, such as author names, titles, publishers, publication years, and subject headings, to represent information resources. Records are usually stored in databases that input data in machine-readable forms—for instance, library catalog databases. Topics 5 and 6 simply refer to libraries that are available to the general public and to the services provided thereby, such as library circulation, current content awareness, reference services, and the Internet.
Topic 7, Collection Management, refers to the process of selecting and evaluating library materials, including preserving special collections (rare book collections, for example), collections related to cultural heritage, and original objects. Topic 8, Reference Service, is a part of library services, consisting of reference librarians answering library users’ questions via telephone, email and/or other channels and assisting them in locating library materials and advising them about information resources. Topic 9, Bibliography and Citation, concerns the methods for indicating the references used in a research project. Topic 10, Library Management, is the process of managing library materials and facilitating their use by others through the organizational environment.
Topic 11, Websites, are sets of web pages on the World Wide Web connected to the Internet with domain names and disseminated on web servers that can present multimedia content. Topic 12, Information Science School, refers to formal education in LIS. Topic 13, Librarianship Studies, is understood as the course of study for librarians and professionals in library management and services. Topic 14, Research Methodology, refers to research processes, data collection, and research techniques. LS research methods combine qualitative and quantitative approaches. Topic 15, Library and Information Technology, involves the use of information technology to organize and operate library and information services. Topic 16, Academic Library, refers to university libraries that support higher education curriculum and research. In rapidly changing educational systems, academic libraries need to recognize the necessity of services for their users.
Topic 17, Research Evaluation, is a methodology used for information acquisition and assessment to provide useful feedback on research projects. Topic 18, User studies, refers to the study of library user behavior regarding information needs, access, acquisition, and use, to improve library services. Topic 19, Digital Library, encompasses libraries that collect and service digital information and materials in electronic media formats to support teaching, learning, and creation of digital information, using information technology. Topic 20, Information Systems and Technology, refers to the framework and implementation of information systems and technology that are related to hardware as well as software to support information processing.
To examine LS trends in the 20-year period, the sample years were divided into two time periods, 1997–2006 (1772 articles) and 2007–2016 (2251 articles). LDA topic modeling was used to produce 20 topics. Table 3 and Appendix A show the results of the analysis.
Topics and trends in the two periods.
The weight of a topic, represented by α, is prior weight of topic k in a document. It is initially the same for all topics, but once a topic is learned, it is assigned its unique weight.
To identify trends between the two time periods, a directional similarity-based topic matching was adopted. In this technique, each topic in either time period was linked with the closest topic in the other time period using a cosine similarity measure. Consequently, there were 40 directional links between the two time periods: 20 links were between topics in 1997–2006 (P01) and the closest topics in 2007–2016 (P02), and another 20 links were between topics in P02 and the closest topics in P01. The first 20 links will be called P01 links because the links can be from P01 viewpoint, while the rest 20 links will be called P02 links. Both links could be located between same topic pair or not. Based on the results, not only topic rank change but also topic mergers and separation between the two periods could be identified.
Table 4 shows the results two types of pairs: the topic pairs in the left column are matched with P01 links, and the right column with P02 links. In each column, there are several sub-columns, and each sub column includes P01 topics, P02 topics, similarity coefficients, P02 topics, P01 topics, and similarity coefficients from the left.
Closest pairs of LS topics between the two periods.
Some pairs with P01 links are the same as the pairs with P02 links but not others. When there are the same topic pairs in both periods, the topic was not merged or separated between the periods, so that the rank change is the only interest.
To clear the trends visually, Figure 4 was developed based on Table 4. The solid line indicates the closest topic between 2007–2016 and 1997–2006, whereas the dotted line shows the closest topic between 1997–2006 and 2007–2016. That is, the solid line is the topic matching for the time period 1997–2006, whereas the dotted line is the topic matching for the years 2007–2016.

Changes in between the two periods.
In the 2007–2016 period, there are several topics which do not have a solid line, such as “Collection-based Service” and “Library and Public Services.” These topics can be regarded as “new” or “rising” topics. Other rising topics include “Library and Information Technology Management,” “Metadata” and “Quality Management of Library Services.”
The dotted lines show the origins of the rising topics in years 2007–2016. For instance, “Collection-based Services” seems to be derived from “Collection Development” in years 1997–2006, as is shown by the dotted line to “Collection-based Service.” Therefore, the topic “Collection-based Services” is not entirely new to the years 2007–2016; rather, it can be understood as a newly rising topic derived from “Collection Development” in years 1997–2006.
Moreover, six topics from the earlier period—Information Literacy, Web Search, Role of Liberian, Research Information, Librarianship, and Library Management in Countries (Case Studies)—disappeared in the latter period. Nevertheless, when considering the closest topics in years 2007–2016 for the disappeared topics in years 1997–2006, the outcome shows that these absent topics were merged with other topics into the following new topics:
Information Literacy → Academic Librarianship
Web search →Information Retrieval and Citation
Role of Librarian →Academic Librarianship
Research Information →Library and Information Science Research
Librarianship → Librarianship and Research Content Management
Library Management in Countries (Case Studies) →Library Quality Assessment
The expansion of the rising and declining topics can be understood as topic separation or topic merging. For example, “Collection Development” in the 1997–2006 period paired with three topics in years 2007–2016 in the dotted lines: “Collections and Research Evaluation,” “Collection-based Service,” and “Metadata.” Among them, “Collections and Research Evaluation” is also paired with “Collections Development” in a solid line. These pairs demonstrate that while “Collections Development” has emerged as an interesting topic in “Collections and Research Evaluation” (2007–2016), it has also expanded and separated into more specific and rising topics, such as “Collection-based Service” and “Metadata.”
On the other hand, in the 2007–2016 period “Academic Librarianship” looks the same as it does in the 1997–2006 period, but in the 2007–2016 period receives another solid line from “Information Literacy” of the 1997–2006 period. This means that “Academic Librarianship” of the 2007–2016 period is a merged topic from “Academic Librarianship” and “Information Literacy” of the 1997–2006 period.
“Academic Librarianship” of the 2007–2016 period also receives a dotted line from “Academic Librarianship” of the 1997–2006 period. Consequently, “Information Literacy” in the 1997–2006 period seems to associated with “Academic Librarianship” in the same period and merges to “Academic Librarianship” in the 2007–2016 period. “Academic Librarianship” of the 2007–2016 period, then, is not only merged from two topics but also declines from its first ranking (“Information Literacy” of the 1997–2006 period) and 14th ranking (“Academic Librarianship” of the 1997–2006 period) to rank 15th in “Academic Librarianship” of the 2007–2016 period
Therefore, when two or more solid lines are linked with one of the 2007–2016 topics, the linked 1997–2006 topics are merged into the 2007–2016 topics. On the other hand, when two or more dotted lines are linked with one of the 1997–2006 topics, the linked 2007–2016 topics are separated from the 1997–2006 topics.
Knowledge structure of library science visualization
An analysis of the terms and clusters in the LS knowledge structure for this 20-year period revealed several terms that serve as key nodes; these include: Library, Information, University, Management, Service, Research, Science, Comparison, Model, Archive, Process, Academic, Access, Promotion, User, Strategic, Web, Introduction, System, and Study. Table 5 displays these terms with the corresponding betweenness centrality weights.
Betweenness centrality measure of key node labels.
The Gephi visualization results show the network of co-occurring words; bigger nodes represent higher centrality, which indicates that these words are important keys in the network. In the 20-year period, Library is a key node that most strongly collaborates with the marketing node and participates with other network nodes including management, challenge, improve, etc. The network exhibits the cross-discipline relations in the LS knowledge structure as shown in Figure 5.

Visualization of LS knowledge structure in the 20-year time period.
This network analysis serves to explain connections between LS knowledge areas using word co-occurrence. Examining the term Library, for example, produces word pairs like library marketing, library management, library promotion, and related areas that link to the other nodes in the network, such as university library (academic library). The term Information, similarly, produces information analytic, information introduction, etc.
VOSviewer (http://www.vosviewer.com/) was used to visualize the word clusters of LS knowledge in the past 20 years. The keyword co-occurrence map it generated characterizes the structure of LS knowledge. Figure 6 shows the relations between words and indicates by way of by font size and colored bubbles how often terms occurred. The graph also shows how frequently words occurred together, which means that this graph can present how common LS knowledge areas relate to each other. The findings show the co-occurrences of 137 words (items) in 10 clusters—displayed in different colors. Library, Management, and Information remain key terms in the network insofar as they connect to other terms. However, when considering each cluster, it is observed that the first cluster (Red) cooperates with 40 items, thus making it the largest cluster. The second cluster (Green) includes 27 items such as library, service, support, literacy, librarian, etc. These words represent the field of LS. The third cluster (Blue) combines 16 items that relate to user studies or user behavior in LS. The 13 items in the fourth cluster (Light blue) represent the topic of library collection management. The last top-5 cluster (Yellow) merges 13 items including researcher, literature, perspective, encompassing the library research methodology field.

Visual representation of the word clusters in LS during the past 20 years.
To get an overview of important LS knowledge areas in either of the two 10-year periods examined in this study, the VOSviewer density visualization was used to explore term trends. The labels identify the terms and each point on the map has a distinct color according to its term density. The colors range from red to blue, where red represents the largest number of terms and blue the smallest. The results (Figures 7 and 8) show that the terms library and information remained dominant through all time periods, and most terms—study, service, user, etc.—appeared in both periods. However, the changes in each area in both 10-year periods respectively show that many LS knowledge areas increased dramatically between 2007 and 2016 (Figure 8). The red sections are stronger, for instance, for terms like research, librarianship, management, retrieval, model, and technology. Furthermore, several new terms including decision, management, business, and assessment appear in the red zone. This indicates the development of new interest in these terms between 2007 and 2016 in LS.

An overview of LS knowledge areas between 1997 and 2006.

An overview of LS knowledge areas between 2007 and 2016.
The term science appeared on the map. Different colors on the map were based on term density scores, ranging from blue (low score) to green (average score) to red (high score). A higher score suggests that the term may become increasingly popular in the future. Moreover, between 1997 and 2006 (Figure 7), the area of the term library was active in isolation from the others, but from 2007 to 2016 (Figure 8), it arose close to the area of information and other related fields. According to this data visualization, it can be assumed that this change has been driven by a cross-disciplinary science, LIS, which scholars have been increasingly studying.
The visualization outcomes demonstrate that the key terms of LS include library, information, university, management, service, and process. The top-5 LS clusters combine IS, LS, user studies, collection management, and research methodology. Figures 7 and 8 provide details regarding the visualization results.
Discussion
The results of topic modeling show that, during the past 20 years, LS has gradually increased its interaction with IS; the IS classification schemes compiled by Shifra Baruchson-Arbib (Zins, 2007)—for example, Information Retrieval, Information Literacy, Information Technology, Websites, Metadata, Digital Library, Collection Management, and User Studies—reveal this growing interaction. These topics are not, however, unique to computer science or IS, given that the terms of each topic correspond to LS terms like Library Information Technology and Digital Library. Clearly separating LS and IS remains difficult, though various scholars have attempted to differentiate the concepts. Miksa (1992) explained that LS focuses on libraries and institutions, whereas IS focuses on information and the communication thereof. Meanwhile, other researchers have described LS as a subfield of IS (Ingwersen, 1992; Vakkari, 1994).
The visualization analysis results tend toward the same direction in that they exhibit the cross-discipline relations in the LS knowledge structure; they support Chang and Huang (2012) finding that LIS articles are widely distributed across 30 disciplines, including general sciences, computer science, business/management, education, and sociology. This also confirms Åström (2010) finding that IS and LS can be seen as strong indicators of the interdisciplinary nature of LIS.
As shown in Table 6, the LS knowledge structure can be divided into two main clusters: the librarianship cluster, which focuses on libraries as institutions, and the interdisciplinary, cluster, which focuses on applications of information and communication science in libraries.
Clusters of LS knowledge structure in the 20-year time period.
The librarianship cluster represents the topics that support understanding in core library science background knowledge. The topics in this cluster draw more attention to library services and library science school. This cluster encompasses: Academic Libraries, Public Libraries, School Libraries, Digital Libraries, Library Services, Library Research Methodology and Evaluation, Librarianship Studies, Library Management and Roles, Library Quality Assessment, Knowledge Management, User studies, Library Marketing, Citing information and Bibliography, Libraries and Associations, Dissertation Services, Information (Research) Literacy, Collection Management, Cataloging and Metadata, Information Science School, Library and Information Technology, and Digital Content Delivery. On the other hand, the interdisciplinary cluster has relevance for information services topics related to supporting library user tasks. The topics in this cluster include Information Management, Information Seeking, Information Retrieval, Digital Information and Access, Information Preservation, Online Searching, Websites, Information Systems and Technology, Research Content Management/Research Software/Reference Software, and Document Citation. Some studies have addressed these topics even though they identified LS as a sub-field of LIS (Aharony, 2012; Blessinger and Frasier, 2007; Hjørland, 2000; Jabeen et al., 2015; Koufogiannakis et al., 2004; Sethi and Panda., 2012; Singh and Chander, 2014; Van Den Besselaar and Heimeriks, 2006)
It is followed in the ranking by document management, library and information management, etc. This outcome confirms the results of the term density visualization by depicting term trends in the library knowledge structure. The terms study, service, user, research, librarianship, management, and retrieval have been dominant during the past 20 years. Moreover, previous studies have identified some of these core topics (Blessinger and Frasier, 2007; Milojević et al., 2011)
The technological paradigm shift has significantly impacted almost every discipline, and LS is no exception. This study’s proposed approach shows that research trends in LS are increasingly associated with emerging technologies although the main role of LS has not changed significantly. The results also indicate that LS has not migrated into another field, but rather that information technology and information delivery methods have been adapted to library tasks. Some of the topic trends such as research content management, reference and software, and document citation may necessitate the further examinations of ways in which LS has evolved.
Conclusion
Scholars have used several approaches to study and explain the knowledge topics and trends of LS, but few studies have focused solely on the knowledge structure of LS. Moreover, extant studies have not covered the present decade. In addition, most previous studies have analyzed a limited amount of content. This study fills this gap in current scholarship by describing the knowledge structure of LS during the past 20 years (1997–2016) by analyzing 4023 full-text articles.
The outcome showed that, over the last 20 years (1997–2016), the knowledge structure of LS has mainly consisted of topics related to the librarianship and interdisciplinary fields. In the final 10-year period (2007–2016), LS knowledge concentrated more on the topic areas of digital information management and research methodology due to the growth of information technology. This study also demonstrated that text-mining techniques are a powerful means of analyzing and extracting knowledge of discipline structures from large datasets. A method to identify trends in the LS field, combining text summarization, topic modeling, and topic similarity measures provides a new way of identifying LS topics and trends.
This study not only provides an up-to-date view of the knowledge structure of the LS field but also describes the origin of constituent sub-fields. In addition, it should help LS researchers identify the general components of the LS field—the basic problems, approaches, and accomplishments. Furthermore, the proposed method may help predict future trends in the LS field by describing the changing topic trends based on topic similarity between different time periods. Therefore, this study should encourage further research into both the identification of the current state of scholarly fields and into how we can extrapolate future topic trends from current trends.
This study had a few major limitations. First, it relied mainly on content analysis to map the knowledge structure of LS. To remedy this concern, author co-citation analysis may shed a different light on the LS knowledge structure. Thus, as a follow-up study, we will undertake an author co-citation analysis to identify the main contributors to the field of LS and how the knowledge structure has been formed based on citation relationships. In addition, this study will utilize full-text analyses to extract and identify the knowledge structure of LIS—an interdisciplinary science. This study’s second limitation stemmed from the fact that the evaluation of LS topic trends and knowledge structure were mainly based on the quantitative approach. Expert opinions and qualitative analysis should have been integrated to confirm the results presented in the study. We acknowledge this limitation and intend to address it in the follow-up study.
Footnotes
Appendix A
LDA Topics modeling results (Years 2007–2016).
| Topic | Coefficient | Total term | Terms (Frequency) |
|---|---|---|---|
| 1 | 0.00748 | 747 | library (173.00) university (153.00) libraries (84.00) information (58.00) services (56.00) academic (50.00) management (48.00) public (45.00) state (41.00) staff (39.00) |
| 2 | 0.02379 | 3173 | information (911.00) science (438.00) seeking (364.00) research (274.00) studies (245.00) literacy (235.00) library (223.00) sources (197.00) behavior (152.00) students (134.00) |
| 3 | 0.02178 | 1590 | library (295.00) information (218.00) research (193.00) libraries (166.00) academic (143.00) data (129.00) collection (122.00) services (115.00) study (109.00) evaluation (100.00) |
| 4 | 0.01639 | 2328 | information (445.00) library (412.00) research (384.00) science (305.00) university (189.00) students (157.00) LIS (122.00) school (114.00) study (105.00) academic (95.00) |
| 5 | 0.00821 | 1098 | library (201.00) information (183.00) university (170.00) UK (112.00) science (111.00) management (93.00) libraries (62.00) research (58.00) services (56.00) USA (52.00) |
| 6 | 0.02768 | 2798 | ? (625.00) information (334.00) retrieval (306.00) search (260.00) based (258.00) query (235.00) processing (233.00) system (193.00) document (185.00) model (169.00) |
| 7 | 0.01175 | 1386 | library (335.00) librarians (150.00) libraries (142.00) public (135.00) academic (123.00) services (115.00) information (106.00) staff (97.00) management (93.00) service (90.00) |
| 8 | 0.0211 | 5605 | management (1021.00) library (992.00) information (931.00) cite (919.00) document (821.00) university (291.00) libraries (265.00) research (132.00) development (122.00) academic (111.00) |
| 9 | 0.01343 | 3286 | management (621.00) library (607.00) information (577.00) cite (406.00) document (333.00) libraries (229.00) university (229.00) academic (103.00) science (103.00) public (78.00) |
| 10 | 0.00672 | 618 | library (144.00) information (109.00) management (89.00) university (75.00) libraries (52.00) journals (31.00) services (31.00) quality (30.00) service (29.00) RSB (28.00) |
| 11 | 0.01289 | 2095 | library (476.00) management (402.00) information (263.00) university (176.00) libraries (170.00) services (163.00) cite (127.00) service (127.00) document (101.00) academic (90.00) |
| 12 | 0.01742 | 1187 | library (234.00) web (174.00) libraries (161.00) resources (108.00) academic (92.00) information (92.00) ? (88.00) study (86.00) services (77.00) users (75.00) |
| 13 | 0.0125 | 1534 | library (309.00) information (219.00) science (186.00) research (149.00) management (140.00) technology (122.00) university (121.00) libraries (120.00) services (85.00) national (83.00) |
| 14 | 0.00989 | 896 | digital (160.00) data (128.00) library (104.00) libraries (94.00) information (91.00) metadata (88.00) systems (68.00) university (62.00) management (51.00) object (50.00) |
| 15 | 0.00979 | 954 | information (182.00) library (165.00) libraries (117.00) public (98.00) science (88.00) services (72.00) policy (62.00) research (60.00) social (58.00) access (52.00) |
| 16 | 0.00606 | 727 | mail (99.00) digital (96.00) online (87.00) libraries (87.00) published (86.00) university (85.00) Springer (67.00) Verlag (66.00) ? (62.00) s00799 (58.00) |
| 17 | 0.00943 | 931 | library (195.00) information (127.00) libraries (93.00) students (89.00) academic (88.00) research (87.00) university (80.00) services (62.00) librarian (55.00) resources (55.00) |
| 18 | 0.0052 | 1485 | research (184.00) process (157.00) computer (154.00) software (149.00) texts (145.00) professional (145.00) considers (140.00) reference (139.00) applications (137.00) works (135.00) |
| 19 | 0.00866 | 883 | library (141.00) information (130.00) research (93.00) lists (89.00) science (87.00) managing (86.00) contents (77.00) university (69.00) librarianship (58.00) ? (53.00) |
| 20 | 0.01211 | 1131 | library (233.00) libraries (173.00) information (173.00) science (109.00) research (108.00) university (78.00) public (72.00) study (70.00) management (61.00) librarianship (54.00) |
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
