Abstract
Earthquake prediction is one of the important themes of earthquake research, and it is also a very difficult scientific problem in the world. In this study, a bibliometric analysis is conducted on the scientific publications about earthquake prediction indexed in SCIE (Science Citation Index Expanded) and SSCI (Social Sciences Citation Index) databases during the past two decades (1998–2017). The subject categories, annual and journal distributions, leading countries/regions and institutions are investigated in this field. The main research topics are identified through text mining method. The research trends are explored by keyword co-occurrence analysis and bursting keywords detection techniques. The results of this study are helpful for scholars in this field to find the knowledge structure and important participants. It is also helpful for scholars to seize the current research hotspots and future development trends in this field.
Introduction
Earthquake is vibrations caused by the rapid release of energy from the earth’s crust, and a natural phenomenon of seismic waves occurs during the earthquake [1–3]. Earthquakes often cause serious casualties, which can bring fires, floods, toxic gas leaks, the spread of bacteria and radioactive materials, and may cause secondary disasters such as tsunamis, landslides, collapses, and ground fissures [4–7]. Earthquake is one of the most serious natural disasters to human survival [8, 9]. In the 21st century, there have been dozens of strong earthquakes of magnitude 8.0 or higher in the world, and it has become a rare period of high earthquakes since the history of the earthquake. Especially in recent years, there have been frequent earthquakes around the world. Table 1 lists brief information on the strong earthquakes above 8.0 in the world since 2012. Minimizing earthquake disasters has become a common goal of all countries in the world. The information in Table 1 is from the China Seismological Network Center (http://www.ceic.ac.cn/history).
Earthquakes with magnitudes over 8 in recent years
Earthquakes with magnitudes over 8 in recent years
Earthquake prediction is one of the important topics in the field of earthquakes. Due to the high uncertainty of seismic physical processes and the limitations of current scientific and technological levels, the prediction of earthquakes is quite difficult [10]. Especially the short-term imminent earthquake prediction is always a worldwide problem that has plagued seismologists all over the world. However, since ancient times, humans have never stopped researching and exploring the theory and technology of earthquake prediction [11, 12]. Common earthquake prediction methods include ground-stress [13], electromagnetic anomalies observation [14], ionospheric anomalies observation [15], abnormal groundwater observation [16, 17], GPS [18] and so on. Recently, the advanced methods are developed for the earthquake prediction, such as machine learning algorithms [19], deep learning [20], artificial neural network [21].
Different earthquake prediction methods have different advantages and disadvantages, but they have similar anomalous overall characteristics. All include long-term trend anomalies, short-term anomalies, and impending earthquake anomalies [22]. The abnormality usually manifests as rapid rise and fall, sudden reversal and so on. The greater the anomaly range and the longer the duration, the greater the magnitude of the earthquake. When the anomaly is discontinuous, discrete, discontinuous in information, and has obvious paroxysmal nature, it is mostly a small earthquake [23]. The output of international earthquake prediction and prediction research papers has been growing, especially in recent years. According to statistical analysis, there are thousands of academic papers related to earthquake prediction, and there is a strong growth trend, indicating that more and more scientific research institutions and researchers are paying attention to the development of this field. In this context, it is very necessary to use scientific methods to quantitatively analyze existing academic publications on earthquake prediction, revealing the knowledge structure, development status and evolution of the field.
As a highly applicable quantitative measurement tool, bibliometric can effectively solve the above problems and has been widely used in various research fields [24–26]. Its applications include information science, library, engineering, computer science, medicine, education, agriculture, forestry, sociology, economics and management [27–29]. However, there is no bibliometric analysis of the field of seismic science, especially for earthquake prediction. In this paper, the earthquake prediction publications included in Science Citation Index expanded (SCIE) and Social Science Citation Index (SSCI) databases are used as the research objects, bibliometric methods [30, 31], data mining techniques, and visualization software such as CiteSpace and Carrot2 are used to conduct in-depth systematic research on the earthquake prediction publications to explore the history, current status and development trends of this field.
The rest of this paper is organized as follows. We introduce the main research methods and data sets used in this paper in Section 2. Section 3 mainly addresses the results about the characteristics of the publications, the most influential and productive categories/countries/institution/journals. Section 4 presents the visualization results of the main research topics and the co-occurrence analysis of keywords. The main findings are summarized in Section 5.
This paper uses the Web of Science (WoS) to collect relevant literature on earthquake prediction. WoS is the largest comprehensive academic information resource in the world and it includes thousands of journals and tens of thousands of academic papers. The time range of data collected in this paper is set to recent 20 years between 1998 and 2017. Then, the collected data is quantitatively and visually analyzed based on bibliometric and data mining methods. Based on the existing retrieval strategies [32] and the aim of this paper, the specific methods for collecting literature data is defined as follows. TS is the abbreviation of topic search in WoS. Based on the above retrieval strategy, 1237 records were collected.
TS=(((earthquake* or earthshock* or quake* or seism* or temblor*) and (geologic or geological or geology)) not (“seismic method*” or “seismic prospect*” or “seismic explor*” or prospect* or “geologic* explore*” or mining or mine or mineral* or log* or oil or gas or methane or hydrocarbon or “nature gas”)) and TS=(predict* or forecast*)
Timespan=(1998–2017).
Databases=(SCI-Expanded, SSCI).
Literature type=(Article OR Review)
This study uses various kinds of bibliometric indicators such as the total number of publication (TP), the total number of citations (TC), average number of citations per paper (TC/TP), author number (AN), reference number (RN), impact factor (IF) and the well-known and widely used h-index [33] to identify the productivity and impact of journals, countries, institutions, and authors in this field. The H index was originally proposed by physicist Jorge Hirsch from the University of California, San Diego, in 2005. H index is a mixed quantitative index, which can be used to evaluate the quantity and quality of academic output of researchers [34–37]. The international-collaborative number (ICN), non-international-collaborative number (NICN), inter-institution-collaborative publication number (IICPN), single-institution publication number (SIPN), single authored publication number (SAPN) and multi-authored publication number (MAPN) [38] are also used to reveal the basic characteristics of cooperation among the authors, journals and countries.
In order to better display the research results, the visualization of major topics, different kinds of co-occurrence networks of keywords, and the detection of keywords with strongest citation bursts are presented through CiteSpace [39, 40] and Carrot2 [41, 42].
Bibliometric results
In this section, various kinds of bibliometric results are presented including WoS subject categories, annual and journal distributions, leading countries/regions and institutions.
WoS subject category analysis
The publications on earthquake prediction during the period of 1998–2017 are categorized according to the WoS subject category. Since earthquake prediction is a comprehensive research topic that needs to be studied in combination with the knowledge of various disciplines, earthquake prediction involves about 60 WoS categories. The top five ones are geochemistry geophysics, geosciences multidisciplinary, engineering geological, water resources and meteorology atmospheric sciences. The cumulative publications and their trends in these five categories for the period 1998–2017 are shown in Fig. 1. The geochemistry geophysics is the most published category in the area of earthquake prediction. The number of articles in this category has a clear growth trend from 1998 to 2010, and it has been maintained at around 40 publications per year from 2010 to 2017 except for 64 in 2016. As can be seen from Fig. 1, the first two categories, geochemistry geophysics and geosciences multidisciplinary, occupy most of the publications on earthquake prediction. Although engineering geological, water resources and meteorology atmospheric sciences rank third to fifth, respectively, there are significantly fewer publications in these categories than the first two categories.

Cumulative publications and their trends in the five categories for the period 1998–2017.
Figure 2 shows the growth of earthquake prediction publications and earthquake publications. It is obvious that the numbers of publications on both topics are growing, and the difference in the number of publications on the two topics is gradually expanding. However, the growth rate of both is almost the same. The number of papers on earthquake prediction increased from 34 in 1998 to 108 in 2017, with an average growth rate of 6.27%, and the growth rate of publications on earthquake was almost 6.5%. This shows that earthquake prediction has attracted the attention of scholars as well as earthquake.

Growth of earthquake prediction publications and earthquake publications.
Table 2 shows the various characteristics of the annual publications for earthquake prediction, including TC, AN, RN and various types of cooperation-related information. As shown in Fig. 2, earthquake prediction papers have grown steadily over the past two decades. The 1237 publications were cited 27264 times, and each paper was cited 22.04 times on average. From Table 2, we find that the numbers of references and the numbers of authors in each paper in the field are also gradually increasing. In terms of annual earthquake prediction publications, the average number of authors per paper is basically less than 4 before 2014, and this indictor is greater than 4 during 2014–2018. At the same time, the references also show similar trend.
Earthquake prediction publications characteristics
IC Rate: International cooperation rate. IIC Rate: Institutional cooperation rate. MA Rate: Author cooperation rate.
415 publications on earthquake prediction were completed through international cooperation, accounting for 33.35%. More than half of the publications (64.51%) were completed through inter institutions cooperation. In addition, most of the publications (90.22%) were done through multi-author cooperation, that is, the number of publications by independent authors is very small. Figure 3 shows the numbers and proportions of three types of cooperative publications in different years.

The number and proportion of three types of cooperative publications.
Because earthquake prediction is a comprehensive, interdisciplinary research topic, its articles are also published in journals from different disciplines. According to this study, 242 journals have published articles on this topic, and 15 of the most productive journals and some of their basic features are shown in Table 3. It can be seen that the articles on earthquake prediction are relatively concentrated. There are 618 publications appeared on these 15 journals (only 6.20%of all journals) and it accounted for 49.96%of the total publications. Bulletin of the Seismological Society of America had the most publications on earthquake prediction (125) followed by Journal of Geophysical Research Solid Earth (82), Geophysical Journal International (62) and Earth and Earth and Planetary Science Letters (47). Furthermore, the publications appeared on these 15 journals have been cited 16582 times in total, on average, 26.83 citations, indicating that the relevant articles on earthquake prediction published in these journals have attracted wide attention from scholars. In terms of TC/TP, Geochemistry Geophysics Geosystems ranked first with 54.58 times per article, and the second and third places are Journal of Geophysical Research Solid Earth (44.62) and Earth and Planetary Science Letters. (40.43).
Top 15 journals with most publications
Top 15 journals with most publications
Next, we will analyze the earthquake prediction publications from the country level. Table 4 shows the 15 most productive countries/regions in the field. Besides the indicator of ‘TC/TP’ indicator, the USA has taken an absolute leading position in other aspects and is far ahead of other countries. In terms of TP, China and Italy occupy the second and third places respectively. Of the 15 countries/regions, 7 are from Europe, 4 are from Asia, and Oceania and North America each have two countries/regions. In terms of TC/TP, France ranked first with 46.62 citations per paper, while England and Australia ranked second and third, respectively. Some other indicators, such as h index, highly cited publication numbers are also presented in Table 4 to describe these leading countries/regions.
Most productive countries/regions
Most productive countries/regions
In order to better explore the development process of earthquake prediction, this section divides the research period of 1998–2017 into four different ones, namely, 1998–2002, 2003–2007, 2008–2012 and 2013–2017. Table 5 shows some indicators for the top five productive countries, such as TP, TC, TP / TC, h values, and the number of highly cited papers. As can be seen from Table 5, the advantages of the USA at four different stages are particularly evident. China has made rapid progress in the total number of papers, and has already ranked second in the next two stages. However, in terms of TC/TP, the gap between China and other productive countries is very obvious. Figure 4 shows the TP of the five productive countries in earthquake prediction and their changes in four different stages.
Most productive ad influential countries/regions in four different stages

TP of the five productive countries and their changes in four different stages.
The United States Geological Survey (USGS) in the USA is the most productive institution in the field of earthquake prediction and is followed by the University of California System (UCS) also from the USA. The top 10 productive institutions in the field of earthquake prediction during 1998–2017 are presented in Table 6. Of these 10 institutions, 4 are from the USA, 2 are from China, and France, Italy, Germany and Russia each have one institution. University of California Berkeley (UCB), California Institute of Technology (CIT) and University of California System (UCS) are in top three positions according to the indicator of TC/TP. Figure 5 shows the number of papers and changes of these 10 institutions at different stages. (Note: The 1998–2017 period is divided into 10 different phases, each of which is two years, Stage 1 : 1998–1999; Stage 2 : 2000–2001, and so on. United States Geological Survey (USGS), University of California System (UCS), Centre National De La Recherche Scientifique Cnrs (CNDLRSC), Istituto Nazionale Geofisica E Vulcanologia Ingv (INGEVI), Helmholtz Association (HA), Russian Academy of Sciences (RAS), China Earthquake Administration (CEA), Chinese Academy of Sciences (CAS), University of California Berkeley (UCB), California Institute of Technology (CIT))
Top 10 productive institutions during 1998–2017
Top 10 productive institutions during 1998–2017

The number of papers and changes of these 10 institutions at different stages.
In this section, the bibliographic landscape in the field of the earthquake prediction is investigated. Firstly, the main research topics in this area are identified based on two different data mining algorithms. Secondly, the co-occurrence analyses of keywords are presented. The burst detection of keywords based on CiteSpace is mentioned in the third part.
Research topics visualization
The research results related to earthquake prediction have grown rapidly around the world and are distributed in different disciplines. To better identify the knowledge landscape of earthquake prediction, we use visual analysis tools to identify the main topics in the field. Based on the data set constructed in this paper, the main topic clustering of earthquake prediction is explored and presented in Figs. 6 7. These two figures are constructed based on Carrot software.

Circle visualization of major topics.

Form trees visualization of major topics.
These two figures show that ‘Motion, velocity, basin’, ‘Earthquake, area, events’, ‘motion, ground, hazard’ and ‘fault, rates, deformation’ are the main topics in earthquake prediction domain. Furthermore, Fig. 8 shows the size and internal structure of different topic clusters.

Form trees visualization of major topics.
In order to further study the related topics and changes in the field of earthquake prediction, this section provides an in-depth analysis of the co-occurrence network of keywords. Keywords can be used to express the subject matter of the literature, not only for scientific papers, but also for scientific reports and academic papers. Studies have shown that keywords are more suitable for expressing related topics than words extracted from headings or abstracts in many cases [43]. In order to identify the changes of the research topics, this section divides the period from 1998 to 2017 into four different time periods: 1998–2002, 2003–2007, 2008–2012 and 2013–2017, and also constructed the corresponding keywords co-occurrence networks based on CiteSpace.
Figures 9–13 show the keyword co-occurrence networks at five different stages. The nodes represent keywords, the size of the nodes represents the frequency at which the keywords appear, and the link between the two nodes means that the two keywords have appeared together in the earthquake prediction publications. The top 20 keywords used in the earthquake prediction publications in the five different stages are also provided in Table 7. It should be pointed out that this section merges some different keywords that represent the same meaning, such as earthquake and earthquakes, model and models.

Co-occurrence network of keywords in earthquake prediction, 1998–2017.

Co-occurrence network of keywords in earthquake prediction, 1998–2002.

Co-occurrence network of keywords in earthquake prediction, 2003–2007.

Co-occurrence network of keywords in earthquake prediction, 2008–2012.

Co-occurrence network of keywords in earthquake prediction, 2013–2018.
Top 20 frequently used keywords in earthquake prediction in five different stages
As shown in Fig. 9, ‘earthquake’, ‘model’, ‘California’ and ‘prediction’ are the four most commonly used keywords in earthquake prediction publications. Figure 9 and Table 7 further reveal the commonly used keywords in this field and their complex relationships. Through Fig. 9, we also found that these frequently used keywords are closely related. In order to analyze this in more depth, the keywords co-occurrence networks of the earthquake prediction publications in four different periods (1998–2002, 2003–2007, 2008–2012 and 2013–2017) are given in the following figures (Figs. 10–13).
In the early stages of the research cycle, besides some broad keywords such as ‘earthquake’, ‘lithosphere’ and ‘zone’ are very common keywords in this field. However, in the later period, these keywords gradually faded from the eyes of experts and scholars. The keyword ‘California’ is a very common keyword in the first two time periods, and is ranked sixth in both two periods. However, in the next two time periods, the attention of this keyword has been further improved, ranking second in the 2008–2012 and third in the 2013–2017. This indicates that the earthquake prediction problem in the California region has received more and more attention from scholars in the past 10 years, and it is also a hot research topic in this field. At the same time, the results of the study also show that ‘response spectra’ ranked 8th in the 2008–2012 and 11th in the 2013–2017, ‘ground-motion’ ranked 4th and 6th in these two time periods respectively. The study of the dynamic changes of these keywords reveals the changes in the research topics in the field of earthquake prediction and the current frontier topics.
CiteSpace can be applied to burst detection of keywords [39]. Through the analysis of the burst of keywords, we can detect the evolution of the research frontier of a certain subject. The keyword with strongest citation burst means that it has received special attention in the corresponding time interval [44, 45]. As shown in Table 8, the keywords with strongest citation bursts that appeared in 1998 was ‘lithosphere’, ‘east pacific rise’ in 2002, ‘los-angeles basin’ and ‘rupture’ in 2007. Four bursting keywords, ‘ground-motion prediction’, ‘motion prediction equations’, ‘induced seismicity’ and ‘classification’ appeared in 2013 and 2014, indicating the current research hotspots in the field of earthquake prediction. Among all the burst keywords, the strength of the “motion prediction equations” that appeared in burst in 2015 was the highest, with the value of 8.6705. This indicates that the study of motion prediction equations is very active and it is also a key topic in the field of earthquake prediction in recent years.
Keywords with strongest citation bursts
Keywords with strongest citation bursts
This paper presented a bibliometric analysis on the publications in the field of earthquake prediction from 1998 to 2007. This paper systematically studied the WoS categories, influential countries/institutions/journals, research topics, research hotspots, etc. in this field. The results of this paper explored the evolution of this field.
The geochemistry geophysics is the most published category in the area of earthquake prediction. The numbers of publications on earthquake and earthquake prediction are both growing, and the difference in the number of publications on the two topics is gradually expanding. However, the growth rate of both is almost the same. The number of references and the number of authors in each paper in the field of earthquake prediction are also gradually increasing. 33.35%publications on earthquake prediction were completed through international. More than half of the publications (64.51%) were completed through inter institutions cooperation, and most of the publications (90.22%) were done through multi-author cooperation. The articles on earthquake prediction are relatively concentrated on some journals. Bulletin of the Seismological Society of America had the most publications and followed by Journal of Geophysical Research Solid Earth and Geophysical Journal International. The USA is the most productive and influential countries in the field of earthquake prediction. China has made rapid progress in the total number of publications, however, in terms of TC/TP; the gap between China and other productive countries is very obvious. The United States Geological Survey (USGS) in the USA is the most productive institution in the field of earthquake prediction and is followed by the University of California System (UCS) also from the USA.
‘Motion, velocity, basin’, ‘earthquake, area, events’, ‘motion, ground, hazard’ and ‘fault, rates, deformation’ are the main topics in earthquake prediction domain. ‘Earthquake’, ‘model’, ‘California’ and ‘prediction’ are the four most commonly used keywords in earthquake prediction publications, and these frequently used keywords are closely related. The study of motion prediction equations is very active and it is also a key topic in the field of earthquake prediction in recent years.
This paper only considers the papers included in the SCI and SSCI databases, but does not consider other important documents that have not been included in these two databases. Future research will try to solve this problem. The research results of this paper have certain reference value for scholars in the field of earthquakes, especially in the field of earthquake prediction. It is also helpful for scholars to understand the knowledge structure, research hotspots, research trends and other issues in the field of earthquake prediction.
Footnotes
Acknowledgments
This manuscript was supported by the Social Science Foundation Project of Jiangsu Province, China (No. 20GLC010), the research project of Humanities and Social Sciences in Universities of Jiangsu Province, China (No. 2019SJA0337), the Natural Science Research Project of Universities in Jiangsu Province, China (No. 19KJB120008) and the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (20KJA520006).
