Abstract
More than two decades have passed since the establishment of the initial cornerstones of the Semantic Web. Since its inception, opinions have remained divided regarding the past, present and potential future impact of the Semantic Web. In this paper – and in light of the results of over two decades of development on both the Semantic Web and related technologies – we reflect on the current status of the Semantic Web, the impact it has had thus far, and future challenges. We first review some of the external criticism of this vision that has been put forward by various authors; we draw together the individual critiques, arguing both for and against each point based on the current state of adoption. We then present the results of a questionnaire that we have posed to the Semantic Web mailing list in order to understand respondents’ perspective(s) regarding the degree to which the original Semantic Web vision has been realised, the impact it can potentially have on the Web (and other settings), its success stories thus far, as well as the degree to which they agree with the aforementioned critiques of the Semantic Web in terms of both its current state and future feasibility. We conclude by reflecting on future challenges and opportunities in the area.
Introduction
Arguably the first concrete milestones towards realising the Semantic Web were the 1998 release of the initial versions of the Resource Description Framework (RDF) [14] and RDF Schema (RDFS) specifications [50]. In 2001, Berners-Lee et al. [9] would position RDF as a key technology for realising their vision of what they called the “Semantic Web”, which would “bring structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users”. A slew of developments were to follow, culminating in the release of numerous standards, such as OWL, SPARQL, SKOS, RIF, RDB2RDF, SHACL, ShEx, as well as a variety of updates to existing standards. Each standard has received varying degrees of attention and acceptance from researchers, developers, and publishers alike. We refer the reader to the survey by Gandon [25] for further details on the developments and trends in the Semantic Web research area spanning the first two decades.
More than two decades on there are varying opinions on the extent to which the original vision of Berners-Lee et al. [9] has been realised – or indeed, the extent to which it can or should be realised.
Within the Semantic Web community, there has long been a consensus that while the vision has yet to be fully translated into reality, it was a question of when, not if. In 2006, Shadbolt et al. [74], while admitting that the Semantic Web wasn’t “yet with us on any scale”, argued that it soon would be once the “standards are well established”. In 2007, Horrocks [43], while likewise admitting that “fully realising the Semantic Web still seems some way off”, argued that OWL had “already been very successful” and had “become a de facto standard for ontology development in fields as diverse as geography, geology, astronomy, agriculture, defence and the life sciences”. The years that followed were marked by optimism with regard to Linked Data, with authors claiming an exponential growth of data published following these principles [20,47,57,64]. Optimism was further expressed with the selective adoption of Semantic Web technologies by household names, including the BBC [48], the New York Times [71], Oracle [87], Facebook [82], Google [11,30], Wikimedia [86], Amazon [2], and so forth. More recent announcements of the development of knowledge graphs by Google [76], LinkedIn [34], Bing [75], eBay [67], Amazon [49], Airbnb [17], etc., have also been viewed as a win for the Semantic Web community.
The Semantic Web has not only had numerous proponents down through the years, but also numerous vocal opponents. As early as 2001, impassioned criticism of the vision of the Semantic Web began to emerge, with Doctrow’s often cited “Metacrap” essay [23] laying out the seven “insurmountable obstacles” that made the Semantic Web vision “a pipe-dream” in his view; in summary, he criticises the naivety of expecting users to create high-quality structured content, and of expecting domain ontologies to be globally agreed-upon given the many possible interpretations on how a particular domain may be described. Various other online articles and blog posts criticising the Semantic Web emerged through the years. Here we summarise a number of recent, prominent examples (found through web searches for Semantic Web-related terms combined with negative terms such as “fail”, “dead”, etc., further following hyperlinks to related articles):
In 2013, ter Heide [81] suggested that the Semantic Web had “failed” mainly due to: not catering to a typical user’s interests, not considering new streams of information such as messages, and expecting users to pull complex information rather than being pushed content relevant to them.
In 2014, Rothkind [69] discusses a thread on Hacker News, asking “is the Semantic Web still a thing?”, critiquing in particular the lack of incentive for publishers to invest in publishing Linked Data versus publishing the data in its native format; he highlights the lack of clear business models for doing so, noting that the infrastructure to exploit Linked Data had “not really materialized, and it’s hardly clear that it will”.
In 2016, Cagle [16] suggested that the Semantic Web had “failed”, primarily because it is hard to understand, and it does not fit with other familiar paradigms (citing Object Oriented Programming), arguing for more lightweight semantics (taxonomies) to alleviate the burden on users.
In 2017, Cabeda [15] suggested that the rapid advancement in Machine Learning techniques “leaves the Semantic Web in the dust”, and concluded that it “needs to evolve and integrate its ideas with artificial intelligence”.
In 2018, Target [80] – while giving a brief history on the major developments of the Semantic Web – suggests that it has “threatened to recede as an idea altogether”, observing that “work on the Semantic Web seems to have petered out”; while he acknowledges adoption in settings such as the Open Graph Protocol and schema.org, and commends technologies such as JSON-LD, he ultimately concludes that there are many “engineering and security issues” to be addressed before the original decentralised vision of the Semantic Web can be meaningfully realised.
These critiques of the Semantic Web raise a number of important issues in terms of the feasibility of realising its original vision and should be carefully considered in the context of the Semantic Web community: while the community is perhaps generally aware of such potential criticisms, it is not always clear what (if anything) should be done to address them.
Some such critiques have been addressed by members of the community, both formally and informally. In a 2013 keynote, Hendler [36] counters a number of criticisms of the Semantic Web – such as the lack of need for ontologies, the inability of the relevant technologies to scale, etc. – while ultimately concluding that there are open challenges to face, particularly in terms of uniting Ontologies and Linked Data, and developing practical reasoning methods for the Web. In a 2017 keynote, Mika [60] provides a brief history of the Semantic Web, noting a “chicken and egg” problem in the early days of applications requiring data and applications being needed to incentivise the publication of data, but discussing how more and more incentives are available for publishing data through initiatives such as Linking Open Data, schema.org, etc.; he further discusses some application domains – Semantic Search, eCommerce, Social Web – in which Semantic Web concepts are being deployed.
Given the differing opinions that yet exist two decades on, we believe it to be a fitting moment to understand the varying perspectives within the Semantic Web community itself regarding its impact thus far, the aforementioned critique, and the opportunities presented and challenges faced when looking to the future. Along these lines, in this paper:
we first review external critique of the Semantic Web, synthesising the primary criticisms raised, presenting an argument both for and against each; we present the results of a questionnaire posed to the Semantic Web mailing list, aiming to ascertain the various perspectives of respondents regarding the extent to which Berners-Lee et al.’s original vision of the Semantic Web has been realised or can be realised, the level of perceived impact that the Semantic Web has had thus far on the current Web, the success stories of the Semantic Web, as well as opinions of the main points of critique resulting from the previous analysis; we summarise the main success stories, opportunities, and challenges found regarding the past, present and future of the Semantic Web.
Critique of the Semantic Web
Based on the previous critiques of the Semantic Web, we now distil ten main criticisms paraphrased from these articles [15,16,23,69,80,81]; though the list of issues should not be considered comprehensive, it covers the main points in the articles found. We first summarise the point of criticism, providing references for sources that inspire its inclusion; we then argue both for and against each point in turn to better understand its implications.1
We do so in the style of a debate, meaning that the author does not necessarily hold the point-of-view being argued for/against.
Categories: Human, Social
Critique: Scenarios used to motivate the Semantic Web are fact-based and often overly specific and complex. The majority of users are only interested in finding individual webpages with simple facts, opinions, social recommendations, etc., rather than solving complex queries on factual content involving multiple sources. The current Web, with the help of search engines like Google, thus covers (and will continue to cover) the needs of the vast majority of users.
For: Search engines such as Google, Bing, Yandex, etc., have improved considerably over the years, where finding information on the Web is now easier than ever. In a July 2014 analysis of organic Google click-through rates, Petrescu [66] estimated that users click on a result listed on the first page for 71.3% of searches and on a later page for 5.6% of searches; these figures do not account for users clicking paid results, finding answers directly on the results page, refining their search, etc. With current search engines, most user searches can be quickly and easily resolved. Aside from search, use-cases relating to modelling complex domains using ontologies, data integration in enterprises, etc., are not tangible for ordinary web users.
Against: There are many niche problems of importance to society with which the Semantic Web can help, including, for example, drug discovery in the case of rare diseases [45]. However, the Semantic Web is not limited to niche use-cases. Search engines themselves have been adopting Semantic Web concepts to enable semantic search; for example, through schema.org [30], Knowledge Graphs [75,76], etc. On the other hand, while current search engines are excellent for finding individual webpages, the Semantic Web vision addresses more complex types of queries that require drawing information from multiple sources on the Web. While current searches generally appear to be resolved quickly (e.g., are answered by a single high-ranking result), users may not be currently issuing more “complex” searches as they know search engines will not offer useful results. Searches requiring cross-referencing multiple webpages are not necessarily niche, but may rather be personalised [9]; for example, finding the closest store open now selling aspirin does not appear to be niche, and could be better automated with Semantic Web techniques. Regarding users’ interests, the Semantic Web does not only address encyclopaedic data, nor does it only address search; for example, its graph-based data model can be used to integrate and find novel connections within social data [13]. Regarding other use-cases, though users may not know of the use of Semantic Web techniques within specific domains or enterprises, this does not prevent them from benefiting from such technologies.
The Semantic Web will be made redundant by advances in Machine Learning before it has a chance to take off [15]
Category: Technical
Critique: The Semantic Web assumes that the current (HTML-based) Web is poorly machine-readable. However, advances in Machine Learning are increasingly undermining this assumption. By the time the Semantic Web could reach enough maturity to have major impact on the Web, Machine Learning will have advanced to a point where such technologies for publishing/consuming structured content are made redundant.
For: Advances in areas such as Deep Learning have led to results that previously seemed unachievable in the short term. Machines can now perform more “human-like” tasks with increasing precision and recall. These advances, combined with developments in Information Extraction, increasingly blur the lines between human-readable and machine-readable content [56]; as a relevant example, in the TAC–KBP “Cold Start” challenge, which requires systems to extract knowledge-bases from scratch from text, systems improved their
Against: Techniques like Deep Learning are still applied as a form of specialised Artificial Intelligence, requiring extensive training data to build models for one particular task. Though impressive gains are being made, the aforementioned
The Semantic Web depends too much on reliable publishers [16,23]
Categories: Human, Technical
Critique: The Semantic Web is founded on the idea that machines will automatically process structured content on the Web. Such processing is particularly brittle in the face of both indeliberate errors and deliberate deception due to unreliable publishers (as commonplace on the Web).
For: Automatically solving complex tasks on the Semantic Web involves processes such as inferencing to integrate information. Such processes work by assuming input data to be held true and computing other entailments that then follow; this assumption is clearly naive for Web data. Even small errors in the input data (e.g., inconsistent claims) can lead to nonsensical entailments; in previous work we found 301 thousand RDFS/OWL inconsistencies in a crawl of 4 million RDF documents (294 thousand relating to datatypes, 7 thousand relating to instances of disjoint classes) [12]. More complex tasks require more complex chains of inferencing, where each step accumulates a higher probability of error. Such processes could then be easily manipulated by deceptive agents.
Against: The Semantic Web community recognises that publishers are not always reliable, and though the issue of data quality is a major challenge, it is one that the community has been addressing [90]. Much like on the Web, rather than assume all information to be trustworthy, two elements are required: reliable sources of data, and methods to accurately estimate the reliability of sources. Specifically regarding inferencing, methods such as paraconsistent reasoning [53] are more robust to noisy inference, while methods such as authoritative and quarantined reasoning [68] select more trustworthy sources for inferencing based on link analysis. Finally – as acknowledged by the original vision paper [9] – users should not blindly trust results, but can rather be provided details (on-demand) of how these results were achieved, refining criteria as required.
The Semantic Web depends too much on ontological agreement [16,23]
Categories: Social, Technical
Critique: There is no single way to model a domain using an ontology. There is no global truth. Different stakeholders in the domain may consider different semantics for terms or even hold contradictory claims. The Semantic Web is brittle to differing views.
For: Is a tomato a “fruit” or a “vegetable”? Is Pluto a “planet”? Is Sherlock Holmes a “person”? The answer to each such question depends, either due to a lack of consensus, or ambiguity on what terms like “fruit”, “person”, etc., mean. While we might define in an ontology that all mayors are people, Bosco the Dog was elected mayor of Sunol, California while Duke the Dog was elected mayor of Cormorant, Minnesota. The real-world is messy and hosts innumerable perspectives on what is true, or what “truth” even means. Edit wars on Wikipedia evidence such disagreement [89]. These ambiguities and conflicts are the true underlying cause of interoperability issues, and rather than solving them, ontologies (particularly expressive ones), require them to be have been solved beforehand; doing so at the scope of the Web presupposes either a utopian (global agreement reached) or a dystopian (global agreement enforced) view of society.
Against: To be more precise, the Semantic Web benefits from – rather than requires – ontological agreement. The fact that full agreement cannot always be reached does not preclude the utility of formally capturing the agreement that can be reached. While agreement on detailed domain definitions is costly, ontologies such as SNOMED CT [51] show that it can be achieved with sufficient will and organisation. For the broader Web, initiatives such as schema.org [30] show that agreement is possible on lightweight semantic definitions (given sufficient incentives). The impact of collaboratively-edited datasets such as Wikidata [54,86] further exemplify ways in which (partial) agreement can be fostered in an emergent way. Considerable attention has been given by the Semantic Web literature to resolving inconsistencies reflecting different views [12], to inferencing over contextual data reflecting different versions of truth [31], and so forth. Furthermore, ontologies are defined in a decentralised way [84], where stakeholders can adopt their preferred ontology or define their own, giving rise to an emergent agreement; exemplifying this, Schmachtenberg et al. [72] found that FOAF and Dublin Core were used by 69% and 56% of the 1,014 RDF datasets that they crawled. In the case of multiple competing ontologies, mappings can be computed or defined to enable interoperability by bridging the concepts on which they agree [24]; along these lines Vandenbussche et al. [84] find over 5 thousand links between different vocabularies in their collection.
Publishing Semantic Web content on the Web has a prohibitively high cost [16]
Categories: Economic, Technical
Critique: Given data in a legacy format, a relational database, JSON, CSV, etc., there is a prohibitively high cost associated with publishing the data using the Semantic Web standards.
For: Publishing Semantic Web content in a suitable way – e.g., following Linked Data principles [35] – requires expertise. Where data are available in a structured format, conversion to RDF is far from straightforward, especially when issues such as offering dereferenceable IRIs, adding links, etc., are considered [42]. While certain types of data are easily conceptualised as RDF graphs, others require various forms of indirection (e.g., reification [37]) to be properly represented.
Against: Most websites are now based on data stored in databases. Standards have been developed to reduce the cost of publishing RDF from legacy data, key amongst which are the RDB2RDF mappings [6,22] for generating RDF data from relational databases, and JSON-LD for lifting JSON to an RDF-style data model [78]. Tools have been developed to help with tasks such as linking, most prominently Silk [85] and LIMES [63]. Exporters built into commonly-used platforms such as Drupal allow thousands of websites to begin publishing RDF quickly and easily [18]. Work continues to better support more and more types of data, such as the standardisation of the RDF Data Cube vocabulary for representing statistical data [19].
There are too few incentives for adopting Semantic Web technologies on the Web [69]
Categories: Economic, Social
Critique: Aside from the costs of using Semantic Web technologies on the Web, there is little incentive to do so, due in part to the fact that the infrastructure for publishing and/or exploiting such content on the Web has not been adequately developed or adopted.
For: The Semantic Web has long faced a chicken-and-egg problem [60]: incentives for publishing data require infrastructure to exploit those data, while infrastructure for exploiting data cannot develop without data. While the Linked Data community partially resolved this dilemma by successfully convincing various stakeholders to publish data on the (implicit) promise that applications would arrive to justify the cost, these applications did not emerge, and as a result, many datasets and related services went offline [5,41]; for example, Aranda et al. [5] estimated, in 2013, that around 29% of the 427 public SPARQL services they found had gone offline. The dearth of Linked Data applications hint at an important lesson: publishing data independently of a particular application implies higher costs for leveraging that data in that application; publishing data independently of any application then implies higher costs for all applications. Finally, one of the main incentives for publishing on the current Web is advertising revenue, where it is not clear how advertising would work on the Semantic Web where software agents, rather than humans, access websites [29].
Against: In the case of schema.org [30], publishers are incentivised to embed structured data in their webpages by the promise of “rich snippets”: having the data – denoting images, ratings, etc. – displayed in search engine results, offering a more eye-catching result summary that attracts more clicks; as a result, schema.org has been widely adopted on the Web, where Meusel et al. [58] found more than 700 thousand pay-level-domains (websites) hosting schema.org content in the 2014 WebDataCommons dataset. Such examples show that incentives do exist for Web publishers to provide more structured content: offering such content can, in the context of certain applications, help direct traffic back to a website or increase demand for a particular product or service it describes, which can drive new business models that replace traditional advertising revenues [29]. The varied use of datasets such as Wikidata [54,86] – whose SPARQL service received over over 3.8 million queries per day in the first quarter of 2018 [54] – show that a variety of applications – including some not originally envisaged – can benefit from the increasing availability of structured content offered by the Semantic Web.
The Semantic Web standards are too verbose [16,80]
Category: Human
Critique: The Semantic Web standards are (unnecessarily) long, complex and difficult to understand. This creates a major barrier for attracting new adopters. More concise standards would have been better.
For: Most of the Semantic Web standards have been designed by committee, anticipating use-cases that had yet to arrive or be fully understood, sometimes focusing on academic rather than practical issues. The resulting standards are difficult to understand, with much of their complexity dedicated to relatively niche issues; as a result, we can find various calls to simplify the standards, with, e.g., Berners-Lee calling for the deprecation of various features in the RDF standard in 2010 [8]. In the same way that JSON has become more popular than its more complex XML cousin, simpler standards that suffice for common needs will tend to win out versus complex standards that (additionally) address more niche need; along these lines, for example, Meusel et al. [59] found over five times more Microdata/Microformats statements than RDFa in the 2013 Common Crawl dataset; in previous works, we found that (pure) RDFS is much more prevalently used than OWL in Web data [27]; and so forth.
Against: When speaking of verbose standards, one should not overlook the SQL:2016 standard [44], which has 1,732 pages – yet the core of SQL is broadly adopted and understood. One does not need to understand the entire standard in order to profitably use parts of it. Along the same lines, one does not need to understand the model theoretic definitions of RDF to describe data in RDF, nor does one need to understand the semantic conditions defined for OWL to use it to describe an ontology, etc.; rather practitioners can start with a simple system based on the parts of the standards important for them, extending their use of the standards – as needs arise – towards building more complex (and powerful) systems that work for them. Simpler standards that arise can also be mapped to more complex standards; for example, Microdata and Microformats are directly convertible to RDF. More modern Semantic Web standards – such as JSON-LD [78] – have also had success in terms of adoption.2
See
Category: Technical
Critique: Consuming data published using the Semantic Web standards requires algorithms with poor scalability and/or performance. Current implementations exhibit poor scalability and/or performance.
For: Even the most common tasks that one might consider over (most of) the Semantic Web standards are intractable. Deciding if two RDF graphs have been parsed from the same document, potentially with different blank node labels (aka. RDF isomorphism), is GI-complete [39]. SPARQL query evaluation is PSPACE-hard (PSPACE-complete for the original standard [65]). Entailment is undecidable for OWL (2) Full and N2EXPTIME-complete for OWL 2 DL [62]; infamously even the OWL “Lite” fragment of the original OWL standard – motivated as a more terse fragment permitting more efficient reasoning – was later found to have EXPTIME-complete entailment. Other experimental works have shown Semantic Web query engines to be considerably outperformed by relational databases; for example, with the Berlin SPARQL Benchmark, Bizer and Schultz [10] show that, in some cases, MySQL can execute 13 times more queries in a given time period than the best SPARQL store tested (Sesame) considering comparable queries.
Against: Such complexity results are not particular to Semantic Web proposals, where for example the complexity of SPARQL query evaluation is analogous to that for SQL [65]. More generally, worst-case complexity results rarely tell the whole story: the fact that there exists at least one input for which a task is difficult tells us little about how efficient solutions might be for practical inputs (see, e.g., [39]). Achieving scale and efficiency often requires trade-offs, where by trading in completeness, OWL reasoning has been shown to scale to billions of triples [68,83]; along similar lines, a variety of tractable profiles of OWL 2 have been defined that trade expressivity for efficiency of reasoning tasks [62]. More practically speaking, a poor implementation does not refute its underlying idea. With this aside, some more recent benchmarks show, for example, SPARQL engines being capable of outperforming graph databases and relational databases for more complex graph patterns [38]. Anecdotally, we can also point to Wikidata’s decision to use Semantic Web technologies (RDF, SPARQL, etc.) to publish and manage its content, with positive (performance) results [54]. Adoption of the Semantic Web standards by major vendors – such as Oracle [87] and Amazon [2] – further help to (anecdotally) refute this criticism.
The Semantic Web lacks usable systems & tools [16]
Categories: Human, Technical
Critique: Practitioners who are initially interested in adopting Semantic Web technologies are quickly alienated by a lack of usable tools for their use-cases.
For: While one may argue that end-users need not understand the Semantic Web to benefit from it – that the Semantic Web is something “under the hood” powering end-user applications – such an argument still supposes the availability of systems, tools, etc., for building these applications. While many systems and tools have been developed for the Semantic Web, the bulk have been created in an academic context for the purposes of proving a concept described in a paper. Systems often go offline after the paper is published; tools may rather be of a more prototypical nature; few resources are tested in terms of usability [46]; etc. On the other hand, newer competing technologies with more usable, developer-friendly resources are seeing more adoption, including formats such as JSON/Microdata/Microformats being more popular than RDF [59], the Neo4j graph database being far more popular than its closest SPARQL rival,3
Against: While the Semantic Web could always benefit from having more (usable) systems and tools, most standards have a variety of mature implementations to choose from (including from well-known vendors such as Oracle [87], Amazon [2], etc.). On the other hand, the adoption of similar, competing technologies is an opportunity for the Semantic Web, as in the case of JSON-LD [78] successfully leveraging the popularity of JSON to help (implicitly) bridge the gap between developers and the Semantic Web. Along similar lines, various works have looked at making property graphs – the model underlying many graph databases [3] – and RDF graphs interoperable [21,32]. The same story is borne out with proposals such as GraphQL-LD [79], this time bridging GraphQL and SPARQL. What we see, then, is increasing adoption of the core concepts underlying the Semantic Web: structured data formats, graph-based data modelling, public query APIs, etc.; with some syntactic glue, these advances can be leveraged as advances, in turn, for the Semantic Web.4
In a signed public comment in the questionnaire described later, Staab refers to this as a “hijacking strategy” (e.g., JSON-LD “hijacking” JSON, adding a core Semantic Web principle), expressing the opinion that is is an excellent way forward.
Categories: Economic, Social, Technical
Critique: The original vision of the Semantic Web is a decentralised one (where, e.g., individual health care providers host their own web-site with their own structured content). On the other hand, on the current Web, centralisation has become the predominant paradigm (considering Google, Facebook, etc.). Decentralising the Semantic Web is too costly.
For: Berners-Lee et al. [9] talk about individual providers (doctors, physical therapists, etc.) hosting their own websites and agents, giving a decentralised setting for the Semantic Web. However, the Web has tended more and more towards centralisation, with individual providers rather collecting on central, specialised websites. For example, rather than hosting personal websites, most people rather host profiles on social networks. Likewise success stories sometimes quoted for the Semantic Web have involved some level of centralisation: Wikidata [86] centralises data creation and curation, schema.org [30] centralises the schema/ontology, and so forth. Decentralisation incurs significant conceptual and practical costs in terms of design, performance, etc. In terms of querying, for example, Schmidt et al. [73] demonstrate that local query processing is often orders of magnitude more efficient than federated querying over endpoints, even when statistics about remote data are made available for optimisation purposes. More generally, no precedent exists in the Semantic Web setting for the type of decentralised infrastructure envisaged by Berners-Lee [9].
Against: There is an emergent public awareness of the problems associated with growing centralisation in terms of users’ privacy, control of data, etc. Along these lines, the recently standardised Linked Data Platform [77], along with projects such as Solid [55], not only further a decentralised vision of the Semantic Web, but also position the Semantic Web as a path towards a more decentralised Web. Abstractly, the benefits of centralisation versus decentralisation are mostly technological – benefits that will inevitably shrink as technology continues to improve. Conversely, the benefits of decentralisation versus centralisation are mostly social, be they upholding privacy, avoiding hegemony and monopoly, averting censorship, etc. – benefits that will at least remain constant, or more likely grow, over time. Asymptotically speaking, the relative benefits of decentralisation will thus, over time, increasingly dominate those of centralisation.
Questionnaire
We have, thus far, presented ten points critiquing the Semantic Web, arguing both for and against each individual point; the goal in each case was not to reach a verdict, but rather to understand possible arguments on both sides. We are now rather interested to see what members of the Semantic Web community, more broadly, think of the current state of adoption of the Semantic Web, what impact it could have in future, what they view as the main success stories thus far, and finally, what they think of the previously raised points of critique. We are particularly interested in the perspectives of experts in the Semantic Web who have read and worked extensively on the topic and can thus offer a more informed opinion; it is important to keep in mind, however, that targetting experts in this way may in turn lead to a pro-Semantic Web bias.
We designed a questionnaire for these issues and sent it to the W3C Semantic Web mailing list5
The questionnaire began with two questions to ascertain the self-assessed level of expertise of the respondent in terms of Semantic Web topics. The first question asked respondents to select one of the following options regarding their own level of expertise:
The results are shown in Fig. 1, indicating strong expertise on the Semantic Web amongst respondents, as was the goal of the questionnaire: to target experts.

Self-reported expertise of respondents.
We were further interested to know if respondents’ expertise was mainly relating to academia, industry, or other settings; we thus asked respondents to select all that applied to them from the following:
The results shown in Fig. 2 reveal that 63.7% of respondents have an academic background, while 42.5% (also) have an industrial background.6
We highlight a possible ambiguity in the question for what students should choose (noticed after posting the questionnaire).

Type of expertise of respondents in terms of Academia, Industry, Other, and combinations thereof.
In order to understand to what extent the respondents believe that the original vision of the Semantic Web has been already realised, to what extent they believe it can be realised in future, the impact it has had thus far and the impact it will have (in terms of both the Web and other settings), we posed the questions shown in Fig. 3 to the participants. The results are shown in Fig. 4, displaying the distribution of votes, as well as the mean

Realisation and impact section of the questionnaire.

Responses to realisation and impact section of the questionnaire (shown in Fig. 3).
From these results, we observe the following:
regarding the original vision of the Semantic Web, the majority of respondents believe that it remains mostly or completely unrealised;
regarding the potential for realising the original vision of the Semantic Web in future, while 10 respondents believe it is completely unfeasible to realise, 14 believe is it completely feasible to realise; other responses were weighted towards believing it is mostly feasible to realise;
regarding current impact on the Web, responses were weighted towards the centre: that while Semantic Web technologies play some role on the Web, they do not play a key role;
regarding future impact on the Web, responses were weighted towards an optimistic view, with 76 respondents indicating their belief that Semantic Web technologies will play a significant or key role on the future Web;
regarding current impact in settings other than the Web, responses were weighted towards the centre: that while Semantic Web technologies play some role, they do not play a key role;
regarding future impact in settings other than the Web, responses were again weighted towards optimism, with 76 respondents again indicating their belief that Semantic Web technologies will play a significant or key role in the future.
While respondents tend to be reserved about the extent to which the Semantic Web has been realised and the impact that related technologies have had thus far, they tend to be much more positive regarding the future; per
We also performed a two-tailed z-test to look for statistically-significant differences in mean responses for each of the six questions between the 40 respondents who have only worked in academy (i.e., the Acad (Only) group of Fig. 2) and all other respondents; with

Tag cloud of success stories for the Semantic Web (left) along with top-10 keywords (right).
We next asked respondents to list success stories they associate with the Semantic Web; specifically:
A total of 90 non-empty responses were collected. In order to summarise the main success stories mentioned, the raw responses required some manual curation. While some respondents provided keywords on individual lines, others rather answered with full sentences or paragraphs of free text; in these cases, we manually extracted a list of keywords from such text. While some responses referred to concrete standards, datasets, initiatives, etc., other responses rather referred to more general concepts and domains. Regarding the latter cases, distinct but related terms – such as
Figure 5 illustrates the main success stories referenced in the responses, with schema.org [30] being the most referenced project. Knowledge Graphs (e.g., [17,34,49,67,75,76]), Wikidata [86] and DBpedia [52] fill the next positions, followed by two keywords often mentioned side-by-side: Bioinformatics and Ontologies. Linked Data was next, followed by a sequence of three standards: RDF, JSON-LD and SPARQL. Informally, we noticed a number of clusters of responses: (1) those focused on the Web and Public Datasets, including search engines, embedded meta-data, Wikidata, DBpedia; etc.; (2) those focused on Semantics, including the use of ontologies in specific domains, particularly bioinformatics; (3) those focused on Enterprises, particularly relating to Knowledge Graphs, Data Integration and Data Governance, etc.; and (4) those focused on the Public Sector, including relevant initiatives within governments, libraries, museums, etc.

Example question for critique number 8.
The next part of the questionnaire sought feedback on the ten points of critique presented previously. More specifically, we presented the title and description of each point of critique as given in Section 2 without the associated arguments for or against. We then asked respondents to indicate the extent to which they agreed with the stated critique, both in terms of the current state of the Semantic Web, as well as how significant an obstacle it might pose to future development and adoption of the Semantic Web. In the cases of points (7) verbose standards, (8) does not scale and (9) lacks usable tools, we further ask respondents to indicate the standards they believe to be most problematic regarding the highlighted issue (if any), selecting zero-to-many from RDF (data model), RDFS, OWL and SPARQL. By way of example, Fig. 6 shows the question issued for point (8); the same structure was followed for other points, with
The results for

Responses to

Responses to
Looking to the future, the results for
With respect to the four categories previous discussed (Economic, Human, Social, Technical), from Figs 7 and 8, we observe that the criticisms for which there was most agreement related predominantly to Human issues: (3) unreliable publishers, (7) verbose standards, and (9) lacks usable tools; on the other hand, respondents tended to disagree with purely Technical issues: (2) redundant w/ML, and (8) won’t scale. These results suggest that respondents see more pressing issues relating to the human aspect of Semantic Web technologies rather than the technical aspect.

Responses to
Within these results, we again look at the difference between the respondents with only academic experience and other respondents; we find statistically-significant differences for three questions:
Finally, regarding (7) verbose standards, (8) problems with scale, and (9) a lack of usable tools, Fig. 9 presents the results of
The questionnaire ended with a comments section, where respondents could indicate both public and private comments. These comments varied in content.
Some comments, both positive and negative, spoke directly of the questionnaire. Aside from individual comments relating to the questionnaire being too long, the way in which options were ordered, and the lack of a “don’t know” option (rather each question was optional) a number of public comments suggested other issues not raised, specifically relating to: social aspects, shared vocabularies, complex information modelling, agility of standardisation, RDF syntaxes, semantic modelling, lack of high-level abstractions, etc.
Other comments expressed more detailed opinions on the overall theme of the questionnaire, on specific critiques, or on their outlook for the Semantic Web. Some comments related to being less focused on adoption of Semantic Web standards and more focused on the adoption of its concepts and best practices (even if not using RDF(S), SPARQL, OWL, etc.); how incentives may be bootstrapped; a lack of focus on how data are used; key use-cases such as data maintenance and research data management (under FAIR principles); the need for new/improved standards; the difficulty of modelling certain data in RDF; the need for more dogfooding, education and marketing; problems with the Semantic Web being driven primarily by academia; etc. Other comments rather took a more pessimistic view, noting that if the Semantic Web were useful we should have seen more of it by now, that the Web of “walled gardens” looks set to continue, etc. We refer to the public comments online for more details [40].
Discussion
Two decades on, the general consensus in the Semantic Web community appears to be that there is still a long way to go before the original vision of the Semantic Web is realised. On the other hand, the consensus is that Semantic Web technologies are presently having some impact on both the Web and in non-Web settings, and will continue to have more impact looking to the future. Along these lines, respondents to our survey cite success stories such as schema.org, Knowledge Graphs, Wikidata, DBpedia, Biomedical Ontologies, etc., as examples where the Semantic Web has had most impact thus far. On the other hand, a lack of usable tools, a lack of incentives, a lack of robustness for unreliable publishers, and overly verbose standards, in particular, are widely acknowledged as valid criticisms of the Semantic Web in its current state.
Looking to the future, the general consensus is that while none of the highlighted issues are insurmountable, many do pose non-trivial obstacles to the further adoption and development of the Semantic Web. A theme widely recognised as a key obstacle for the Semantic Web is the lack of availability of usable tools; such issues are known with the community and have been discussed, for example, by Karger et al. [46]. Part of the reason for the lack of usable tools may also be due to the largely academic nature of the Semantic Web, where work on such tools is difficult to publish (seen as “engineering” rather than “science”), while the community perhaps lacks expertise in areas such as Human Computer Interaction (HCI) relating to conducting and publishing usability studies. Another major issue is the lack of incentives, which, with some exceptions such as schema.org [30], remains a general challenge; while some authors have begun to tackle this issue from a more general point-of-view [29], more work is called for. The results of the questionnaire also highlight the need for more work on data quality [90] and methods to ensure robustness in the presence of unreliable publishers [53,68]. The results further reveal issues relating to the (perceived) verboseness of the core standards, particularly OWL, perhaps suggesting the need for (further [1,27,62]) work to better understand and address this issue. A more transversal theme is implicit in the responses: the Semantic Web needs more contributors from other research communities and from outside academia.
Among all of the mentioned issues, one that stands out, in particular, relates to the usability of Semantic Web technologies and their accessibility to newcomers. We thus call for more work on this particular topic – work that may take a number of directions. First, we require more work on tools and interfaces that reduce the cognitive load and expertise required for users to benefit from Semantic Web technologies; ideally, the design of such tools and interfaces should be guided by usability studies with target end-users. Second, we require more work on making the Semantic Web standards more accessible and appealing to newcomers; this may involve simplifying standards, creating more lightweight profiles of existing standards, creating interactive primers to motivate and introduce the standards in a more engaging manner, and so forth.
Along similar lines, we further call for more works that bridge the Semantic Web with other technologies having similar goals, particularly those that gain (or have gained) considerable traction. This may take a number of forms. In the case of languages, mappings can be created to make the technologies interoperable, as was done in the case of OBO and OWL [28], SPARQL and SQL [22], and so forth; more work can be done to align new query languages like Cypher [61], Gremlin [70] or GraphQL [33] with SPARQL [4,21,79], thus more closely aligning the graph database/NoSQL/Web developer community with the Semantic Web. A second option is to take existing technologies, and extend them to support Semantic Web concepts; this has worked particularly well for JSON-LD [78], taking a familiar concept for developers – JSON – and adding some additional syntactic sugar to create an RDF-compatible data format.
The results presented herein highlight that the original vision of the Semantic Web still eludes us, though major strides have been made in recent years. No matter how elusive, however, the Semantic Web vision remains an alluring one (at least to some, including the present author). We are all intimately aware of how the Web has revolutionised society, where the Semantic Web has the potential to further propel the Web to a new stage, marked by unprecedented levels of automation and convenience for users. Unlike twenty years ago, we now have the benefit of many years of experience and research on the topic, as well as established success stories like schema.org, Wikidata, Biomedical Ontologies, etc., to further build upon. Even a partial realisation of the Semantic Web vision will serve (and arguably is serving) as a great boon to society, much like how A.I. is finding more and more applications without ever having surpassed the Turing test. Part of the criticism, perhaps, stems from comparing the Semantic Web with the Web: a technological development to which almost anything else would pale in comparison; while the Semantic Web has not seen the same level of rapid growth and penetration as the Web, this does not devalue the (sometimes quiet) impact that the Semantic Web community can point to, while still hinting at the vast impact it could potentially have. Two decades on, it is thus still a vision that merits patient pursuit, even if – or perhaps even especially given that – there is much work left to be done before the Semantic Web holds the sorts of conclusive answers that might satisfy even its most ardent critics.
Footnotes
Acknowledgements
We thank the respondents to the questionnaire and the reviewers for their helpful comments. This work was funded in part by the Millennium Institute for Foundational Research on Data (IMFD) and Fondecyt, Grant No. 1181896.
