Abstract
In this article, we examine the growth of the Internet as a research topic across the disciplines and the embedding of the Internet into the very fabric of research. While this is a trend that ‘everyone knows’, prior to this study, no work had quantified the extent to which this common sense knowledge was true or how the embedding actually took place. Using scientometric data extracted from Scopus, we explore how the Internet has become a powerful knowledge machine which forms part of the scientific infrastructure across not just technology fields, but also right across the social sciences, sciences and humanities.
Introduction
The World Wide Web, and more broadly the Internet, started out life as a specialized network designed to help facilitate communication among a handful of networked academic and industry researchers in fields such as physics and military research. Over the last 25 years, however, the Web has become not only a topic of study in its own right but also a core part of the toolkit of all areas of research, including computer science, medicine, natural science, social science, humanities and many other fields and subfields. It has, in other words, become a knowledge machine (Meyer and Schroeder, 2015) which consists of digital tools and data that are networked together and able to support research both about the Web itself and crucially about a whole range of additional topics which are enhanced because of the availability of these networked digital tools and data. This knowledge machine in turn is leading to the emergence of new topics and new approaches to research across a range of disciplines, acting as a research technology (Schroeder, 2007) that allows methods and ideas to move outside disciplinary silos. One might ask whether the Internet is the change agent that has allowed scholars to transform research, or whether the digitization of information is responsible. We would argue that, with regard to scholarly research at least, these two are tightly (and inextricably) coupled: digitization of information happened before the rise of the Web (e.g. CD-based or laserdisc resources of the late 1980s and early 1990s), but these alone had only limited transformative power. Likewise, had the Web developed into a purely commercial space (see, for instance, early closed web-like platforms such as Prodigy or AOL) without the growth in digitization of scholarly materials in the early 2000s (Meyer and Schroeder, 2009) and the increasing academic recognition of the potential scholarly value of the traces of online behaviour (Savage and Burrows, 2007, 2009), we may have not seen the dramatic increases in Internet- and web-related research we will be discussing in this article. Taken together, however, these and the other elements of this vast knowledge machine are proving absolutely vital engines of the research enterprise today.
This article will take a scientometric approach to analysing the growth, expansion and influence of the Internet and the Web on scientific disciplines. In the last 25 years, over 300,000 academic papers have been written that explicitly mention some aspect of the World Wide Web or the Internet in the title, abstract or keywords of the publication. The overall pattern of this growth is one of nearly continuous linear upward growth since 1990. Of course, the Web and the Internet are not synonymous; however, for our purposes, in this article we take a practical view that these two terms are widely used in relatively interchangeable ways in normal usage and that inclusion of both terms more accurately reflects our aims to understand how the Web and the Internet have become integral parts of academic research.
The crude data available to anyone do not help us understand how the Internet itself has penetrated a variety of disciplines. In this article, we extracted complete data from the 25-year period from 1990 to 2014 inclusive to analyse this growth and broke it down into a number of elements: the change in disciplines over time, which nations’ authors are writing papers in which the Web/Internet is a significant presence in the research, how international authorship and co-authorship change over this time period, which journals have been most active in publishing topics related to the Web over time, and a number of other measures.
This research was inspired by the call for papers for this Special Issue. When we started to think about the Web at 25, it occurred to us that ‘everyone knows’ that the Internet has penetrated into many corners of both everyday life and academic practice, but to date nobody has tried to quantify this common knowledge (with some exceptions mentioned below that take a somewhat different focus). This article shows the actual landscape of how the Web, or more generically, the Internet, is influencing research.
Literature
Elsewhere, we (Meyer and Schroeder, 2015) have argued that the Internet has become embedded in research in various ways. There are disciplinary differences (Fry and Talja, 2007; Kling et al., 2003) in this regard, but the commonality is that research is increasingly driven by shared and distributed digital tools and data or what came to be known as e-Research or also as cyberscience (Nentwich and König, 2012). This wider process predates e-Research and has continued since, now that e-Research has become more routine, embedded and invisible, and in its most recent incarnation has become ‘big data’ (Schroeder, 2014) – although it can be anticipated that big data too will become invisible in time.
A different way to describe this process is that the Internet has become part of a research infrastructure (Barjak et al., 2013; Hughes, 1983), a large technological system that supports many aspects of research. These processes also mean that ‘Internet’ and some related search terms (such as ‘e-health’) should be keywords mentioned in articles. However, as we have noted previously (Meyer and Schroeder, 2015), not all research that uses digital tools and data (or that can be classified as such) makes explicit that Internet- or e-infrastructure has been used. This lack of explicit referencing will become a wider problem when, as can be foreseen, ‘internet’ and ‘social media’ (another keyword) research, for example, becomes invisible when research on these topics will become part of media and communications research generally, and ‘internet’ and ‘social media’ will no longer necessarily explicitly be in the title or keywords for these articles.
Methods
One of the interesting challenges in work of this kind is that constructing an accurate and reliable search term to extract articles for analysis is more difficult than one might think. Rather than rely on guesswork, we used a systematic approach starting from search terms previously used by Peng et al. (2013), supplemented by some of the terms added by Malik (2012). Peng et al. retrieved 27,340 journal articles from the Web of Science for the period 2000–2009, focusing on the social sciences involved in ‘internet studies’, while Malik retrieved 114,079 social science publications from 1990 to 2011 from Web of Science using a somewhat broader search query.
In order to decide the most accurate search query (set of search terms) for this study, we started with these authors’ lists (Malik, 2012; Peng et al., 2013) and focused on terms related to our broader question related to the spread of the Internet into a variety of disciplines over time. By using Malik and Peng et al. as a starting point, we are starting from a set of pre-tested terms, rather than just selecting terms using a scattershot approach. We have chosen to use Scopus rather than Web of Science for a variety of reasons, most important of which is that the data extractable via our institutional subscription include more fields necessary to answer our research questions than does Web of Science. The scale and scope of Web of Science and Scopus, however, are broadly comparable for the time period we are interested in, although Scopus has more limited coverage for the period before 1996. Scopus is a newer service than Web of Science, and originally included literature mainly from 1996 onwards. However, Scopus has since added more data from earlier time periods, and given the pattern of growth of Internet-related publications, the lesser coverage of the earliest period is not significantly problematic here. Also, the coverage of Scopus is generally broader than that of Web of Science, as more sources are indexed, including additional coverage in the arts and humanities. Both sources, however, do have an inherent bias towards English-language publications, which should be taken into account when interpreting our findings; as we have also only used the English version of search terms and not searched for other language equivalents, we are admittedly underrepresenting related research in areas of the world where academic publication is in languages other than English and in journals not indexed by Scopus.
The search terms we tested are shown in Table 1. Starting with the largest term (‘internet’) and then proceeding through each of the remaining terms (but excluding the largest term ‘internet’ so as to minimize duplication; see note in Table 1), we used the Scopus ‘Advanced Search’ option to query all journal articles published in the 25 years from 1990 to 2014 inclusive of the target term in the title, abstract or keywords of the article (e.g. ‘TITLE-ABS-KEY(internet) AND DOCTYPE(ar) AND PUBYEAR > 1989 AND PUBYEAR < 2015’). Where more than one target term appeared in a single article, it was counted only once. The results were sorted with the top-cited articles first and a sample of 20 articles for each term was downloaded for analysis, followed by the top 20 conference papers (‘DOCTYPE(cp)’). 1 The sampling strategy used was to take every nth article, where n = 100 for search terms yielding 1500 or more results and n = 10 for search terms with fewer than 1500 results. The number of citations was used as the sorting mechanism for this exercise because it yields a more random selection of publications than the alternatives allowed by the Scopus interface. Alphabetical sorting by author, for instance, in the larger samples yields too many publications by authors with a last name starting with the letter ‘A’, which in turn tended to be based in a narrow range of countries because of ethnic differences in naming conventions. Sorting by date is inadequate because either older or newer publications are favoured, and sorting by ‘relevance’ relies on a non-transparent algorithm based on the search term itself.
Search term specificity.
Next, each article was qualitatively coded based on the title, abstract and keywords in relation to the search term(s) used. One person coded the 875 articles in the sample; as a measure of the reliability of the coder, a 10% stratified random sample (n = 90) of the first coder’s results was coded by a second person who was blinded to the first codes assigned. The two coders were in agreement 90% of the time (n = 81), with a Krippendorff’s alpha score of .79; scores above .8 are considered highly reliable, and scores between .667 and .800 allow one to draw reasonable conclusions (Krippendorff, 2004: 429). Since the coding exercise was to determine a range of keywords to use, and not an end in itself, falling slightly below the highest threshold was not considered problematic.
A three-category system was used in which ‘on-topic’ was taken to be those articles and conference papers which had some aspect of the Internet, broadly interpreted, as a key part of their work as indicated by the title, abstract and keywords. ‘Off-topic’ articles were those which were captured by the search term, but were not in any meaningful way focused on the Internet. Examples include passing references to searching Google Scholar for a literature review, mentioning that a resource could be located online or on the Web at such and such an address or pointing out that the project had a blog available. Some terms with particular problems of this nature were discarded before this stage because of their low reliability, such as ‘www’ which gathers any abstract listing a URL regardless of topic. Other publications were coded as ‘off-topic’ because the search term also yielded publications with no relation to the Internet; a key example of this is ‘social network*’ (where * indicates a wildcard, thus encompassing ‘social networks’, ‘social networking’ and so forth) which includes articles on social network analysis, a large and well-developed field of study which in the majority of cases has nothing to do with the Internet.
The third intermediate category (‘marginal’) is more open to interpretation. These were publications in which the Internet did not play an absolutely central role, but were arguably at least more than passingly dependent on the Internet in their content and/or methods. Two of the areas that have a particularly large number of papers than fell into this grey zone were health and government. In the health area, a number of papers that focus more on patient records systems, for instance, than on online access to those systems fall into this category, as do publications on medical research that communicated with participants online. Likewise with government, information technology (IT) implementations that were designed to improve ‘e-government’ but were not primarily public-facing fall into this category. Note also that the term ‘Internet’ itself had a large number of results in this marginal category, largely due to a high number of results that coincidentally were in the health sector and were difficult to clearly classify as in or out of scope for the reasons above (the percent of publications related to the Internet for the term ‘Internet’ increases from 52.5% to 87.5% when including these marginal cases). Some specific examples from this marginal category include descriptions of several scientific databases (such as metabolic pathways and Escherichia coli data) that are available for download from websites, an article about building e-government applications on the Grid, articles about trust in information systems that are connected to the Internet in both the health and government sectors and other similar topics. While these are not ‘about’ the Internet or the Web, it is arguable that the Internet plays a vital part in the work reported and thus should be considered on-topic for our purposes.
In the end, we decided to include in our final search those terms which required the broader interpretation of relevance and set our cut-off as 80% of the sample coded as relevant or marginal. This included the terms for e-government, e-health, Internet and World Wide Web, which otherwise would have fallen below this cut-off. We have aimed for inclusivity in the final set of search terms to cover all areas and disciplines as much as possible in the final dataset.
We include this somewhat extended discussion of our sampling method for two reasons. First, few previous studies have taken a systematic approach of this sort to decide whether to include or exclude particular search terms from their scientometric study, or if they have, they have not documented it in their methods. We feel that this approach is robust and could be further developed and extended. Second, the accuracy and specificity of the search terms are themselves interesting in the context of this article because it suggests how difficult it is to accurately separate the key role of the Internet in research (either as a topic or a centrally enabling technology) from the more passing references to various parts of the Internet’s infrastructure that have become common as the Internet has become ubiquitous.
The final search term 2 yielded 334,659 documents. As can be seen, some of the tables have smaller numbers than that because of issues such as missing data, including missing country data for many authors. Where this is the case, we have noted it.
The data were downloaded from Scopus and then processed in a database using bespoke VBA (Visual Basic for Applications) code to extract authors, author nationality and other details.
Findings
Overall trends
We first can see the overall trend of the Internet having an increasing influence on research over the last 25 years. In Figure 1, we can see the overall growth by year in five major topic areas: Technology (which includes computer science, engineering and mathematics), Medicine (which includes medicine, genetics, immunology, neuroscience, nursing, dentistry, pharmacology and the health professions), Sciences (which includes biology, chemistry, earth sciences, materials science, physics, astronomy and agriculture), Social Sciences (which includes the social sciences, psychology, economics, business and decision sciences) and the Arts and Humanities.

The Internet in major research areas (n = 403,991).
The story here is clearly one of growth and one that makes sense given our lay understanding of how the Internet started coming to public attention in 1995 with the growing availability of Netscape: there has been steady growth across all areas from 1995 to 2010. As noted, Scopus coverage is less complete for pre-1996 publications than for those after, so the early data should be considered incomplete. Less clear is the apparent flattening of growth over the last 5 years. While the apparent drop-off in 2014 is likely attributable to not-yet-updated data in the Scopus database, 3 the flattening curve from 2010 to 2013 is more likely to be an actual trend and not a data artefact, although it is more marked in the technology domains than in the other knowledge domains that we focus on for much of this article.
In Figure 2, the same data are displayed, but this time grouped into 5-year bins. Another aspect of the story appears here: while the predominance of computer science, engineering and mathematics is clear throughout the 25 years in question, the social sciences overcame a late entry into engagement with the Internet to then surpass the activity in medicine and the sciences. In the 2005–2009 and 2010–2014 periods, respectively, the social sciences’ engagement with the Internet in research in this sample grew by 163% and 53%. It is worth noting that some of this growth could be related to the growing prevalence of some of the platforms like social media sites included in our search term being of particular interest to researchers in the social sciences within the wider context of Internet research. However, these make up a relatively small proportion of the overall sample, and the pattern holds even with the broadest search terms such as ‘internet’ that are relevant across the disciplines and the sample. Nonetheless, it would be remiss not to consider the changing nature of the Internet itself over the past 25 years – in particular, its adoption as a popular communication technology – and the implications of these broad trends in drivingrelated research.

The Internet in major research areas, 5-year bins (n = 403,991).
Even later to the game are the arts and humanities, which showed relatively little engagement until the most recent 5-year period, when there is a 451% increase over the previous period (4882 publications compared with 886 in 2005–2009). One thing to note is that both Scopus and Web of Science have inherent limitations with regard to the humanities, which means that humanities publications are under-represented, compared to their actual frequency. Thus, it is difficult or impossible to compare the arts and humanities data directly to the other fields, but the marked growth within the arts and humanities data in 2010–2014 is nevertheless noteworthy and not purely due to increased data coverage.
The Internet: spreading across the disciplines
For most of the remainder of the article, we will focus on data from the domains other than technology, which because of its relatively large size tends to obscure the results in the smaller domains. Thus, the following data exclude articles solely in the computer science, engineering and mathematics domains (although some of the multidisciplinary journals also link into those domains). More importantly, in this article, we are primarily interested in how the Internet has influenced research across the sciences and humanities, rather than in how technological developments underpinning the Internet have grown and been communicated. The latter are also interesting questions, but beyond the scope and focus of this article.
In Figure 3(a) to (e), we can see a visual representation of the spread of Internet-related topics across standardized overlays of journals using the methods described in Leydesdorff et al. (2013) and Leydesdorff et al. (2015). The method Leydesdorff and his colleagues have used to create the underlying map in these figures involves downloading citation information in all 19,600 journals indexed by Scopus from 1996 to 2012 and analysing the matrix of all citations between journals to calculate the distance between every possible pair of journals. Journals which cite each other frequently are likely related in terms of topic and discipline and are thus mapped closely to each other based on cosine-normalized citation. The resulting map thus has all of Scopus-indexed knowledge displayed, with disciplines and fields naturally clustered in various portions of the map (described further below). As the authors explain, ‘interactive overlay maps enable users to project a set of documents onto a base map in terms of the journal distribution’ (Leydesdorff et al., 2015: 1001). Using this standardized data provided by Leydesdorff as the underlying map, we then extracted the journal names from our dataset described above and processed them with Leydesdorff’s tools available at http://www.leydesdorff.net/scopus_ovl/. These tools match our Scopus data with the standardized map of science and overlay the journals in our sample on top of the base map. The underlying map of grey dots thus represents ‘all of science’, or at least as much of it as is represented in 19,600 journals indexed by Scopus, while the coloured dots represent the science, social science, medicine and humanities journals in which articles related to the Internet appear in each of our five 5-year periods. The maps are displayed using VOSviewer, in which ‘the size of each journal as a node is depicted proportionally to the log4(n + 1), with n as the number of occurrences. (The “+1” is added to prevent single occurrences from being displayed, because log(1) = 0.)’ (Leydesdorff et al., 2015: 1007). The colours are assigned by VOSviewer using a community detection algorithm (Waltman et al., 2010).

Spread of Internet-influenced research in domain-based journal articles (n = 81,164 in 9995 journals) related to science, medicine, social sciences and the humanities, 1990–2014, on standardized overlays (n = 19,600 possible journals) of journals (excludes computer science, engineering and mathematics: (a) 1990–1994 (n = 144 articles in 93 journals); (b) 1995–1999 (n = 4918 articles in 1518 journals); (c) 2000–2004 (n = 13,356 articles in 3342 journals); (d) 2005–2009 (n = 25,377 articles in 5760 journals); and (e) 2010–2014 (n = 41,772 articles in 7807 journals).
There are several narratives that become apparent in these visualizations. The first pertains to the spread of Internet-related research, which originally emerged scattered quite widely across the disciplines, as shown in Figure 3(a). In Figure 3(b), we see a marked increase in publications (consistent with Figures 1 and 2 and Tables 2 to 4), and these publications are scattered across the underlying map, which indicates that these topics from the very start were not constrained to a small range of disciplines. In each subsequent figure, we see that the coverage spreads and the publication frequency (indicated by size) intensify in each 5-year period. Finally, in Figure 3(e), we have the most recent 5-year period from 2010 to 2014, which is based on 41,772 journal articles published in 7807 journals. By this time, there are few areas of academic publishing without some activity related to the Internet, with (roughly speaking) the sciences in the upper right, social sciences on the left, medicine on the right and humanities in the lower left. Note that VOSviewer does not display all titles at once in the interests of making the presentation more readable; in the live VOSviewer interface, one can zoom in and see the detail for any given dot (data available from the authors on request).
Top 20 social science–related journals for Internet-related research, by total number of articles published from 1990 to 2014.
Note. The table shading indicates the top journals measured by the number of articles published in each time period (1-5 are shown as darker gray, 6-10 are lighter gray).
Top 20 Medicine-related journals for Internet-related research, by total number of articles published from 1990 to 2014.
Top 20 science-related journals for Internet-related research, by total number of articles published from 1990 to 2014.
As indicated above, the underlying map is based on 19,600 journals in Scopus, so Internet-related publishing represents at least one article in 51% (n = 9995) of all journals in Scopus, and the number would be even higher if we were including the technology fields in our analysis (although a high percentage of those publications are published as conference papers rather than as journal articles). Other than the Internet, has any other topic ever appeared in half of all academic journals over the course of a quarter of a century? If so, we can’t think of it. This sort of penetration of the literature is both remarkable and, as far as we know, unprecedented.
We are not claiming, of course, that this penetration is evenly distributed or absolute. Many of the journals have only one or two articles in our sample. If one plots the number of articles per journal on the y-axis against the ranked order of the journals along the x-axis, there is a marked long-tail curve in each of the main areas of research. There are slight differences in the maximum value and the number of journals in each area, but essentially all four look approximately the same (which is why we haven’t included them here): the same long-tail curve is shown time and again in Internet phenomena: a very tall y-intercept, with a steep drop and curve to a very long tail of thousands of journals with a small number of publications each.
In Tables 2 to 4 we look at some of the top journals publishing Internet-related research over the last 25 years. The table shading indicates the top journals measured by the number of articles published in each time period (1-5 are shown as darker gray, 6-10 are lighter gray). The topic areas of articles are determined by Scopus categorizations at the journal level, so the match is not perfect but gives a reasonably good indication of the types of topics published in a given journal. However, this is also why a few journals appear in more than one of these three tables. 4
We have not included a similar table for the humanities because the data here are really too small in most cases to build a comparable set of data (the 20th ranked journal in terms of volume, for instance, published only 31 papers related to the Internet over this 25-year period). Generally speaking, however, the humanities data in the similar table we did not include tell a story of very little activity at all from 1990 to 2010, with a big increase from 2010 to 2014 (as is consistent with Figure 2).
The Internet in the social sciences
Table 2, which will likely be of most interest to readers of this journal, looks at journals related to the social sciences. The top 20 journals as measured by number of articles (with a secondary sort on average number of citations in the case of journals tied in the rankings) are shown here. The order of the table is based on the overall 25-year period, with the number of articles and mean number of citations within our dataset shown for the overall period, and then each 5-year period in subsequent columns. Also shown are the mean number of citations to articles in this sample for each journal, and in the overall data, the percent of articles that were cited at least once and the percent which were cited at least 10 times (the i10 index, see Kozak and Bornmann, 2012). The i10 index is a more feasible indicator than an h-index for this particular study, and is an effort to highlight the proportion of papers which have been cited a reasonably large number of times.
Again, there are a few things worth noting about this table. First, some of the journals which are most active in our sample are relative newcomers to the publishing scene. Computers in Human Behavior was first published in 1985 and Cyberpsychology and Behavior (continuing as Cyberpsychology, Behavior, and Social Networking) was first published in 1998. New Media & Society, which appears sixth overall and fourth in the most recent time period in terms of volume, published its first issue in 1999. Overall, at least 61,309 articles were published in 5320 journals related to the social sciences from 1990 to 2014, with 33,066 of those articles published in the last 5 years alone.
Also worth noting is the general pattern of growth in prominence for certain journals over time. While New Media & Society experienced considerable growth in prominence during the last 15 years (ranked 25th from 2000 to 2004, seventh from 2005 to 2009, and fourth from 2010 to 2014 in terms of volume), the online journal First Monday has been in the top 15 journals since its inception. Of course, volume isn’t the only, or even necessarily a very good, measure of impact, and New Media & Society papers are cited more frequently (with 93% of papers cited and these receiving an average of 23.3 citations each) than First Monday (61% of papers cited 9.0 times on average), and both are less cited than articles in Cyberpsychology and Behavior (with 99% cited an average of 38.9 times each) and the Journal of the American Society for Information Science and Technology (JASIST) (in which 98% of papers are cited an average of 35.0 times). JASIST, it should be noted, changed names twice in this time period, which is why it only appears to have been published from 2000 to 2009.
The Internet in medical research
Table 3 reports similar data, but this time for medicine-related journals (55,322 articles in 6495 journals). Nucleic Acids Research, which has published many highly cited articles about online data banks such as PROSITE and SWISS-PROT among many others, is top-ranked in volume (n = 3272), in proportion of papers cited (98%) and in average citations per paper (97.6). This prominence is consistent across the time period. PLoS ONE, on the other hand, was only first published in 2006, but then immediately becomes a major outlet for Internet-related medical research.
Again, we see some movement over time, as certain journals such as Telemedicine and e-Health become much more active in recent periods, while others such as Studies in Health Technology and Informatics wax and wane, ranked in the top 15 in volume for three of the time periods, but falling to 109th from 2005 to 2009 and always with a relatively modest 46% of papers being cited an average of 3.3 times each.
The Internet in science
Finally in Table 4, we look at science-related journals. Again, PLoS ONE makes an appearance at the top of the table (remember, journals are often classified in more than one category), with 1001 articles published, 84% of them cited an average of 14.2 times. Some of the entries here are obvious homes for technology-related publications (such as Computers and Geosciences), but many are mainstream science journals in disciplinary topic areas that, because of the penetration of the Internet into these scientific domains, also publish work related to the Internet. This is not ‘Internet studies’, but the Internet as part of the research infrastructure of science.
The geography of the Internet as a knowledge machine
We have been seeing the spread of the Internet as a topic of research spreading across the disciplines over the last 25 years, but what of its spread across the globe over the same time period? In Table 5, we have again used rank by volume, but this time sorting by the country of the corresponding author for the article. 5 In each time period, the countries were ranked (although from the 1990 to 1994 period, too many journals had equal counts of between one and four articles to meaningfully rank them in relation to each other, so rank is excluded for journals with fewer than five publications).
Top countries publishing domain-based Internet-related articles 1990–2014 (based on corresponding author country).
Countries with total n > 500 are shown.
The United States, unsurprisingly, is the most prominent country in Internet-related research, but the same can be said of many other areas of research as well. The United States has a large and prolific research culture, and so published nearly four times as many articles (38,180) as its next closest rivals, China (10,686 articles) and the United Kingdom (9857 articles), over the 25-year period. The biggest mover in terms of absolute volume across the period is China, again unsurprisingly based on what we know about Chinese investments in research in recent years. The United Kingdom and Germany have been consistently in the top five across the last 25 years. Overall, the total 119,272 articles with valid country information for the corresponding author represent 103 countries; all countries with at least 500 publications are shown in this table.
Table 6 breaks down the county information by major research area, 6 listing the top 10 countries in terms of number of publications across the entire 25-year time period. In this table, however, we have extracted all authors from the publications (rather than just the corresponding author as in Table 5), which allows us to look at some of the differences between disciplines and countries at a more granular level. For a paper to be considered as emerging from a particular country, at least one of the authors had to be from that country; thus, individual articles with international co-authorship ‘count’ for two or more countries, but multiple authors on a paper from the same country are only counted once for a given paper. In other words, detail was extracted at the publication level, with the countries for this table interpretable as ‘at least one author from country X was an author on paper Y’. We have also added a fifth area, multidisciplinary publications, which consists of all publications coded by Scopus as multidisciplinary as well as any that were coded for more than one of the other four areas.
Top countries publishing domain-based Internet-related articles in journals and conferences, 1990–2014.
In the social sciences data in the top portion of Table 6, we see the United States is still the number 1 country in terms of volume, with 57,612 authors contributing to 26,003 articles. The United States is also above average in terms of percentage of the papers cited (73.2% compared to the overall average for all countries of 61.5%), the papers cited at least 10 times (30.6% compared to the overall average of 21.6%) and the average number of citations (21.1 per paper compared to the average of 16.2 for all the papers in the social science sample).
China, on the other hand, has reached a point of prominence in the social sciences on par with the United Kingdom; however, it still lags considerably in terms of impact. Only 32.4% of Chinese social science papers related to the Internet have been cited a modest 7.2 times each, and only 5.9% were cited at least 10 times. This lack of impact by authors based in China is consistent with data we have reported elsewhere with regard to e-Research (Meyer and Schroeder, 2015).
The next sections of the table look at the medical sciences, sciences and humanities in turn. Again, a few salient points about these data will suffice in lieu of more space. The United States and United Kingdom are the most consistently active publishing powerhouses across the four research areas reported here, with the United States consistently holding an unassailable top position and the United Kingdom in the top 5. Both countries also consistently outperform the average in terms of percentage of papers cited and the average number of citations to those papers.
Other countries are less consistent across the disciplines. South Korea, for instance, is in the top 10 for the social science and sciences, but not the medical sciences, humanities or multidisciplinary categories. Spain is also a consistently regular player, but with a somewhat lower than average impact. For instance, 56.0% of Spain’s social science papers and 36.2% of humanities papers are cited, and those only a modest 10.7 times and 4.5 times, respectively. Japan, on the other hand, has an astonishingly high average number of citations in the medical sciences category (70.7 average citations for the 74.6% of publications cited); when we examined these data for anomalies, there were a few very highly cited (ca. 20,000) articles about online resources, but also a considerable number of papers with several thousand citations each, so this performance is not just due to a few outliers.
We also looked at co-authorship across all the categories, and here we see a familiar pattern: the medical sciences are most likely to have larger numbers of co-authors (4.4 on average, with 85.9% of papers authored by more than one person), followed by the sciences (an average of 3.6 authors per paper, with 84.9% having multiple authors), followed by the social sciences (2.5 authors, 72.4% co-authored) and the humanities (2.3 authors, 57.1% co-authored). The multidisciplinary papers are somewhere in the middle of the pack (3.4 authors per paper, 83.2% of which are co-authored). While there are some interesting small differences here, there are no real standout differences between countries.
The final data shown in Table 6 pertain to internationalism. The second to last column shows the percentage of papers with authors from more than one country, and the final column shows the percentage of papers with authors from more than one continent. One of the most notable things in these international data is the greatly increased likelihood of international co-authorship for most of the European countries. Across the disciplines, authors in many of the European countries co-author papers with people from other countries (with percentages in the 20s and 30s), but they are only somewhat more likely than non-European counterparts to co-author with people on a different continent. One interpretation of this is that it reflects a culture of scientific cooperation at the European level that is not as common in other parts of the world. One could speculate that some of this is down to geography (since many European countries are relatively small and easy to get to from their neighbours), but it is also possible that European Commission funding mechanisms that require multiple countries to partner on European grants is having the desired impact in terms of continental cooperation and collaboration. Exploring this possibility, however, would require further evidence to either support or refute in detail.
Finally, in Table 7, we present data for the top institutions (for corresponding authors only) across two time periods (the first 15 years, 1990–2004, and the last 10 years, 2005–2014) and across the five main disciplinary areas. For this final part of the analysis, we have added the technology fields back into our data. In these technology fields, we see a number of Asian institutions playing an especially prominent role, which is also true in the sciences, particularly in the latter period. The social sciences and humanities are more dominated by large American institutions, while medicine has a number of prominent medical centres in the United States and Europe.
Top Institutions publishing journal articles related to the Internet, in 1990–2004 and 2005–2014.
Discussion and conclusion
The growth of web- and Internet-related research has followed broadly similar trends over time in the major research areas. While Dutton (2013) has charted the emergence of the field of research related to the Internet’s social implications, we take a broader view examining the impact of the Internet across the sciences, social sciences and humanities. In the case of the social science research related to the Internet, we can see a slow take-off of this research area after the Internet becomes a widely used technology in the mid-1990s. The rise continues to the current day. In other areas, such as the technology disciplines, there is a slowed growth or even a tapering off in the last 5 years. What explains the plateauing? One potential explanation is that the Internet has become dispersed among a wider range of technologies that have superseded it in appeal to researchers in technology domains. So, for example, the Internet has arguably become displaced somewhat in importance by newer web platforms or by smartphones, both of which are of course also an outgrowth of the Internet and part and parcel of it. As mentioned at the beginning of this article, it is also possible that as the Web and Internet have become so thoroughly normal parts of the everyday practice of research (i.e. part of the research infrastructure, which tends to be invisible except when it breaks) that authors have started to mention it less prominently in the titles and abstracts of their publications. This process of becoming so normal as not to merit mention may already be in play, although to confirm this would require further research.
Yet, this trend is not confined only to technology domains. In terms of social science research, too, the early focus on computer-mediated communication has thus given way to a range of subfields. To take just one example, one area of interest is how research is supported by the Internet. But this specialism has itself undergone a number of terminological changes: from e-Research and its various cognates (cyberinfrastructure, e-Science, e-Humanities and the like) to the current focus on ‘big data’ and ‘computational social science’ (Schroeder, 2014). These various specialisms have meant that the label of ‘Internet’ research has become outgrown by various other labels including various ‘e-’ and ‘cyber’ prefixed labels. What we can see here is a simultaneous differentiation of research and a moving of the research front from one area to the next as the technology develops. Future research in this area might therefore use this more specific terminology for research using the Internet to investigate this trend on a more granular level.
Just as we describe research using the Internet as a research infrastructure that is becoming at once more differentiated and more specialized, perhaps the same applies to the increasing invisibility of the Internet as a societal infrastructure. As society’s use of the Internet becomes taken for granted, it is no longer discussed as a separate entity but moves into the background or into various niches while other terms move into the foreground. These include (again), web, or life online, or other technologies such as smartphones in which Internet technology is embedded in the background. Furthermore, these newer technologies then become the subject of new research areas, as when smartphone or mobile phone research has new journals devoted to it which eclipse, as a research area, the older journals devoted to computer-mediated communication.
In terms of distribution of research by country, we see, unsurprisingly, the rise of China from laggard to being second only to the United States in terms of volume of research, although it still lags in terms of impact. We also acknowledge, as indicated earlier in our methods section, that Scopus coverage of journals from certain areas of the world is less complete than its coverage of North American and Western European journals, a problem compounded by the fact that our search strategy relied only on English terms. It is likely that there is additional research being published in this area in countries not captured via these methods. While we are attuned to this recurring form of exclusion of certain authors and languages, the inability to capture such publications in our sample should not be understood as intended to diminish the importance of such work.
What will the next 25 years of Internet-related research bring? One obvious change that has taken place in the last 5 years is that the Internet has been overtaken by mobile phones in terms of the number of worldwide users. This trend is set to continue. An interesting point to note here is that mobile (or cell) phone owners use the Internet often without even being aware of the fact. This underscores our point about the infrastructure fading into the background. However, will mobile phones impact academic research in the same way that the Web and Internet have? The Web and the Internet technologies that underlie it are extensible – generative, to use the terminology suggested by Zittrain (2008). It is this generativity that underpins the widespread growth across scholarly disciplines shown in this article, as both historians who consult digital manuscripts and biologists who share genomic data rely on the same basic underlying generative Internet and the Web interfaces that make it usable. Mobile phone operating systems such as Android and iOS that rely more heavily on a closed app-based ecosystem to access the Internet may prove to have more limited uses across the academic landscape (although many interesting examples already exist); we will have to wait to see how the next 25 years play out to know for certain. It may emerge that the Web sees other equally widespread ways of sharing, communicating and accessing information spring up alongside it, or possibly replace it altogether.
Dutton (2013) characterized the Internet as a ‘network of networks’. Perhaps more accurately, it could be described in the face of the future as an emerging penetration – and disappearance – of digital networks into everyday life. Or, as Ling (2012) has called it in relation to mobile phones, ‘taken-for-granted-ness’. Still, as the Web turns 25, its impact on the academic landscape is undeniable, even as the younger technologies it has spawned may be poised to steal the spotlight.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by the Alfred P. Sloan Foundation (grant number 2012-06-17).
