Survey on challenges of Question Answering in the Semantic Web

Abstract

Semantic Question Answering (SQA) removes two major access requirements to the Semantic Web: the mastery of a formal query language like SPARQL and knowledge of a specific vocabulary. Because of the complexity of natural language, SQA presents difficult challenges and many research opportunities. Instead of a shared effort, however, many essential components are redeveloped, which is an inefficient use of researcher’s time and resources. This survey analyzes 62 different SQA systems, which are systematically and manually selected using predefined inclusion and exclusion criteria, leading to 72 selected publications out of 1960 candidates. We identify common challenges, structure solutions, and provide recommendations for future systems. This work is based on publications from the end of 2010 to July 2015 and is also compared to older but similar surveys.

Keywords

Question Answering Semantic Web survey

1. Introduction

Semantic Question Answering (SQA) is defined by users (1) asking questions in natural language (NL) (2) using their own terminology to which they (3) receive a concise answer generated by querying an RDF knowledge base.1

¹
Definition based on Hirschman and Gaizauskas [80].

Users are thus freed from two major access requirements to the Semantic Web: (1) the mastery of a formal query language like SPARQL and (2) knowledge about the specific vocabularies of the knowledge base they want to query. Since natural language is complex and ambiguous, reliable SQA systems require many different steps. For some of them, like part-of-speech tagging and parsing, mature high-precision solutions exist, but most of the others still present difficult challenges. While the massive research effort has led to major advances, as shown by the yearly Question Answering over Linked Data (QALD) evaluation campaign, it suffers from several problems: Instead of a shared effort, many essential components are redeveloped. While shared practices emerge over time, they are not systematically collected. Furthermore, most systems focus on a specific aspect while the others are quickly implemented, which leads to low benchmark scores and thus undervalues the contribution. This survey aims to alleviate these problems by systematically collecting and structuring methods of dealing with common challenges faced by these approaches. Our contributions are threefold: First, we complement existing work with 72 publications about 62 systems developed from 2010 to 2015. Second, we identify challenges faced by those approaches and collect solutions for them from the 72 publications. Finally, we draw conclusions and make recommendations on how to develop future SQA systems. The structure of the paper is as follows: Section 2 states the methodology used to find and filter surveyed publications. Section 3 compares this work to older, similar surveys as well as evaluation campaigns and work outside the SQA field. Section 4 introduces the surveyed systems. Section 5 identifies challenges faced by SQA approaches and presents approaches that tackle them. Section 6 summarizes the efforts made to face challenges to SQA and their implication for further development in this area.

2. Methodology

This survey follows a strict discovery methodology; Objective inclusion and exclusion criteria are used to find and restrict publications on SQA.

Inclusion criteria Candidate articles for inclusion in the survey need to be part of relevant conference proceedings or searchable via Google Scholar (see Table 1). The included papers from the publication search engine Google Scholar are the first 300 results in the chosen timespan (see exclusion criteria) that contain “‘question answering’ AND (‘Semantic Web’ OR ‘data web’)” in the article including title, abstract and text body. Conference candidates are all publications in our examined time frame in the proceedings of the major Semantic Web Conferences ISWC, ESWC, WWW, NLDB, and the proceedings which contain the annual QALD challenge participants.

Exclusion criteria Works published before November 20102

²
The time before is already covered in Cimiano and Minock [37].

or after July 2015 are excluded, as well as those that are not related to SQA, determined in a manual inspection in the following manner: First, proceeding tracks are excluded that clearly do not contain SQA related publications. Next, publications both from proceedings and from Google Scholar are excluded based on their title and finally on their content.

Notable exclusions We exclude the following approaches since they do not fit our definition of SQA (see Section 1): Swoogle [ 51 ] is independent of any specific knowledge base but instead builds its own index and knowledge base using RDF documents found by multiple web crawlers. Discovered ontologies are ranked based on their usage intensity and RDF documents are ranked using authority scoring. Swoogle can only find single terms and cannot answer natural language queries and is thus not a SQA system. Wolfram|Alpha is a natural language interface based on the computational platform Mathematica [148] and aggregates a large number of structured sources and a algorithms. However, it does not support Semantic Web knowledge bases and the source code and the algorithm are not published. Thus, we cannot identify whether it corresponds to our definition of a SQA system.

Table 1

Sources of publication candidates along with the number of publications in total, after excluding based on conference tracks (I), based on the title (II), and finally based on the full text (selected). Works that are found both in a conference’s proceedings and in Google Scholar are only counted once, as selected for that conference. The QALD 2 proceedings are included in ILD 2012, QALD 3 [27] and QALD 4 [31] in the CLEF 2013 and 2014 working notes

Venue	All	I	II	Selected
Google Scholar Top 300	300	300	153	39
ISWC 2010 [116,117]	70	70	1	1
ISWC 2011 [9,10]	68	68	4	3
ISWC 2012 [40,41]	66	66	4	2
ISWC 2013 [4,5]	72	72	4	0
ISWC 2014 [99,100]	31	4	2	0
WWW 2011 [135]	81	9	0	0
WWW 2012 [101]	108	6	2	1
WWW 2013 [124]	137	137	2	1
WWW 2014 [35]	84	33	3	0
WWW 2015 [67]	131	131	1	1
ESWC 2011 [7,8]	67	58	3	0
ESWC 2012 [133]	53	43	0	0
ESWC 2013 [38]	42	34	0	0
ESWC 2014 [119]	51	31	2	1
ESWC 2015 [66]	42	42	1	1
NLDB 2011 [106]	21	21	2	2
NLDB 2012 [24]	36	36	0	0
NLDB 2013 [97]	36	36	1	1
NLDB 2014 [98]	39	30	1	2
NLDB 2015 [19]	45	10	2	1
QALD 1 [141]	3	3	3	2
ILD 2012 [143]	9	9	9	3
CLEF 2013 [58]	208	7	6	5
CLEF 2014 [30]	160	24	8	6
Σ(conference)	1660	980	61	33
Σ(all)	1960	1280	214	72

Result The inspection of the titles of the Google Scholar results by two authors of this survey led to 153 publications, 39 of which remained after inspecting the full text (see Table 1). The selected proceedings contain 1660 publications, which were narrowed down to 980 by excluding tracks that have no relation to SQA. Based on their titles, 62 of them were selected and inspected, resulting in 33 publications that were categorized and listed in this survey. Table 1 shows the number of publications in each step for each source. In total, 1960 candidates were found using the inclusion criteria in Google Scholar and conference proceedings and then reduced using track names (conference proceedings only, 1280 remaining), then titles (214) and finally the full text, resulting in 72 publications describing 62 distinct SQA systems.

3. Related work

This section gives an overview of recent QA and SQA surveys (see Table 2) and differences to this work, as well as QA and SQA evaluation campaigns, which quantitatively compare systems.

3.1. Other surveys

QA surveys Cimiano and Minock [37] present a data-driven problem analysis of QA on the Geobase dataset. The authors identify eleven challenges that QA has to solve and which inspired the problem categories of this survey: question types, language “light”,3

³
Semantically weak constructions.

lexical ambiguities, syntactic ambiguities, scope ambiguities, spatial prepositions, adjective modifiers and superlatives, aggregation, comparison and negation operators, non-compositionality, and out of scope.4

⁴

Cannot be answered as the information required is not contained in the knowledge base.

In contrast to our work, they identify challenges by manually inspecting user provided questions instead of existing systems. Mishra and Jain [104] propose eight classification criteria, such as application domain, types of questions and type of data. For each criterion, the different classifications are given along with their advantages, disadvantages and exemplary systems.

SQA surveys For each participant, problems and their solution strategies are given: Athenikos and Han [11] give an overview of domain specific QA systems for biomedicine. After summarising the state of the art for biomedical QA systems in 2009, the authors describe different approaches from the point of view of medical and biological QA. In contrast to our survey, the authors do not sort the presented approaches by challenges, but by more broader terms such as “Non-semantic knowledge base medical QA systems and approaches” or “Inference-based biological QA systems and approaches”. López et al. [93] present an overview similar to Athenikos and Han [11] but with a wider scope. After defining the goals and dimensions of QA and presenting some related and historic work, the authors summarize the achievements of SQA so far and the challenges that are still open. Another related survey from 2012, Freitas et al. [62], gives a broad overview of the challenges involved in constructing effective query mechanisms for Web-scale data. The authors analyze different approaches, such as Treo [61], for five different challenges: usability, query expressivity, vocabulary-level semantic matching, entity recognition and improvement of semantic tractability. The same is done for architectural elements such as user interaction and interfaces and the impact on these challenges is reported. López et al. [94] analyze the SQA systems of the participants of the QALD 1 and 2 evaluation challenge, see Section 3.2. While there is an overlap in the surveyed approaches between López et al. [94] and our paper, our survey has a broader scope as it also analyzes approaches that do not take part in the QALD challenges.

Table 2

Other surveys by year of publication. Surveyed years are given except when a dataset is theoretically analyzed. Approaches addressing specific types of data are also indicated

QA Survey	Year	Coverage	Data
Cimiano and Minock [37]	2010	–	geobase
Mishra and Jain [104]	2015	2000–2014	general
SQA Survey	Year	Coverage	Data
Athenikos and Han [11]	2010	2000–2009	biomedical
López et al. [ 93]	2010	2004–2010	general
Freitas et al. [ 62]	2012	2004–2011	general
López et al. [ 94]	2013	2005–2012	general

In contrast to the surveys mentioned above, we do not focus on the overall performance or domain of a system, but on analyzing and categorizing methods that tackle specific problems. Additionally, we build upon the existing surveys and describe the new state of the art systems, which were published after the before mentioned surveys in order to keep track of new research ideas.

3.2. Evaluation campaigns

Contrary to QA surveys, which qualitatively compare systems, there are also evaluation campaigns, which quantitatively compare them using benchmarks. Those campaigns show how different open-domain QA systems perform on realistic questions on real-world knowledge bases. This accelerates the evolution of QA in four different ways: First, new systems do not have to include their own benchmark, shortening system development. Second, standardized evaluation allows for better research resource allocation as it is easier to determine, which approaches are worthwhile to develop further. Third, the addition of new challenges to the questions of each new benchmark iteration motivates addressing those challenges. And finally, the competitive pressure to keep pace with the top scoring systems compells emergence and integration of shared best practises. On the other hand, evaluation campaign proceedings do not describe single components of those systems in great detail. By focussing on complete systems, research effort gets spread around multiple components, possibly duplicating existing efforts, instead of being focussed on a single one.

Question Answering on Linked Data (QALD) is the most well-known all-purpose evaluation campaign with its core task of open domain SQA on lexicographic facts of DBpedia [90]. Since its inception in 2011, the yearly benchmark has been made progressively more difficult. Additionally, the general core task has been joined by special tasks providing challenges like multilinguality, hybrid (textual and Linked Data) and its newest addition, SQA on statistical data in the form of RDF Data Cubes [81].

BioASQ [12,13,113,138] is a benchmark challenge which ran until September 2015 and consists of semantic indexing as well as an SQA part on biomedical data. In the SQA part, systems are expected to be hybrids, returning matching triples as well as text snippets but partial evaluation (text or triples only) is possible as well. The introductory task separates the process into annotation which is equivalent to named entity recognition (NER) and disambiguation (NED) as well as the answering itself. The second task combines these two steps.

TREC LiveQA, starting in 2015 [3], gives systems unanswered Yahoo Answers questions intended for other humans. As such, the campaign contains the most realistic questions with the least restrictions, in contrast to the solely factual questions of QALD, BioASQ and TREC’s old QA track [45].

3.3. System frameworks

System frameworks provide an abstraction in which a generic functionality can be selectively changed by additional third-party code. In document retrieval, there are many existing frameworks, such as Lucene,5

⁵
https://lucene.apache.org.

Solr6

⁶

https://solr.apache.org.

and Elastic Search.7

⁷

https://www.elastic.co.

For SQA systems, however, there is still a lack of tools to facilitate the implementation and evaluation process of SQA systems.

Document retrieval frameworks usually split the retrieval process in three steps: (1) query processing, (2) retrieval and (3) ranking. In the (1) query processing step, query analyzers identify documents in the data store. Thereafter, the query is used to (2) retrieve documents that match the query terms resulting from the query processing. Later, the retrieved documents are (3) ranked according to some ranking function, commonly tf-idf [134]. Developing an SQA framework is a hard task because many systems work with a mixture of NL techniques on top of traditional IR systems. Some systems make use of the syntactic graph behind the question [142] to deduce the query intention whereas others, the knowledge graph [129]. There are hybrid systems that to work both on structured and unstructured data [144] or on a combination of systems [71]. Therefore, they contain very peculiar steps. This has led to a new research sub field that focuses on QA frameworks, that is, the design and development of common features for SQA systems.

openQA [95]8

⁸

http://openqa.aksw.org.

is a modular open-source framework for implementing and instantiating SQA approaches. The framework’s main work-flow consists of four stages (interpretation, retrieval, synthesis, rendering) and adjacent modules (context and service). The adjacent modules are intended to be accessed by any of the components of the main work-flow to share common features to the different modules e.g. cache. The framework proposes an answer formulation process similar to traditional document retrieval and replaces the query processing and ranking steps with the more general Interpretation and Synthesis. The interpretation step comprises all the pre-processing and matching techniques required to deduce the question, whereas the synthesis is the process of ranking, merging and confidence estimation required to produce the answer. The authors claims that openQA enables a unification of different architectures and methods.

4. Systems

The 72 surveyed publications describe 62 distinct systems or approaches. The implementation of a SQA system can be very complex and depending on, thus reusing, several known techniques. SQA systems are typically composed of two stages: (1) the query analyzer and (2) retrieval. The query analyzer generates or formats the query that will be used to recover the answer at the retrieval stage. There is a wide variety of techniques that can be applied at the analyzer stage, such as tokenisation, disambiguation, internationalization, logical forms, semantic role labels, question reformulation, coreference resolution, relation extraction and named entity recognition amongst others. For some of those techniques, such as natural language (NL) parsing and part-of-speech (POS) tagging, mature all-purpose methods are available and commonly reused. Other techniques, such as the disambiguating between multiple possible answers candidates, are not available at hand in a domain independent fashion. Thus, high quality solutions can only be obtained by the development of new components. This section exemplifies some of the reviewed systems and their novelties to highlight current research questions, while the next section presents the contributions of all analyzed papers to specific challenges.

Hakimov et al. [72] propose a SQA system using syntactic dependency trees of input questions. The method consists of three main steps: (1) Triple patterns are extracted using the dependency tree and POS tags of the questions. (2) Entities, properties and classes are extracted and mapped to the underlying knowledge base. Recognized entities are disambiguated using page links between all spotted named entities as well as string similarity. Properties are disambiguated by using relational linguistic patterns from PATTY [107], which allows a more flexible mapping, such as “die” to dbo:deathPlace (see Table 3). Finally, (3) question words are matched to the respective answer type, such as “who” to person, organization or company and “while” to place. The results are then ranked and the best ranked result is returned as the answer.

Table 3
URL prefixes used throughout this work

dbo http://dbpedia.org/ontology/

dbr http://dbpedia.org/resource/

owl http://www.w3.org/2002/07/owl#

PARALEX [54] only answers questions for subjects or objects of property-object or subject-property pairs, respectively. It contains phrase to concept mappings in a lexicon that is trained from a corpus of paraphrases, which is constructed from the question-answer site WikiAnswers.9

⁹

http://wiki.answers.com/.

If one of the paraphrases can be mapped to a query, this query is the correct answer for the paraphrases as well. By mapping phrases between those paraphrases, the linguistic patterns are extended. For example, “what is the r of e” leads to “how r is e”, so that “What is the population of New York” can be mapped to “How big is NYC”. There is a variety of other systems, such as Bordes et al. [21], that make use of paraphrase learning methods and integrate linguistic generalization with knowledge graph biases. They are however not included here as they query RDF knowledge bases and thus do not fit the inclusion criteria.

Xser [149] is based on the observation that SQA contains two independent steps. First, Xser determines the question structure solely based on a phrase level dependency graph and second uses the target knowledge base to instantiate the generated template. For instance, moving to another domain based on a different knowledge base thus only affects parts of the approach so that the conversion effort is lessened.

QuASE [136] is a three stage open domain approach based on web search and the Freebase knowledge base.10

¹⁰

https://www.freebase.com/.

First, QuASE uses entity linking, semantic feature construction and candidate ranking on the input question. Then, it selects the documents and according sentences from a web search with a high probability to match the question and presents them as answers to the user.

DEV-NLQ [63] is based on lambda calculus and an event-based triple store11

¹¹

http://www.w3.org/wiki/LargeTripleStores.

using only triple based retrieval operations. DEV-NLQ claims to be the only QA system able to solve chained, arbitrarily-nested, complex, prepositional phrases.

CubeQA [81,82] is a novel approach of SQA over multi-dimensional statistical Linked Data using the RDF Data Cube Vocabulary,12

¹²

http://www.w3.org/TR/vocab-data-cube/.

which existing approaches cannot process. Using a corpus of questions with open domain statistical information needs, the authors analyze how those questions differ from others, which additional verbalizations are commonly used and how this influences design decisions for SQA on statistical data.

QAKiS [26,28,39] queries several multilingual versions of DBpedia at the same time by filling the produced SPARQL query with the corresponding language-dependent properties and classes. Thus, it can retrieve correct answers even in cases of missing information in the language-dependent knowledge base.

Freitas and Curry [59] evaluate a distributional-compositional semantics approach that is independent from manually created dictionaries but instead relies on co-occurring words in text corpora. The vector space over the set of terms in the corpus is used to create a distributional vector space based on the weighted term vectors for each concept. An inverted Lucene index is adapted to the chosen model.

Instead of querying a specific knowledge base, Sun et al. [136] use web search engines to extract relevant text snippets, which are then linked to Freebase, where a ranking function is applied and the highest ranked entity is returned as the answer.

HAWK [144] is the first hybrid source SQA system which processes Linked Data as well as textual information to answer one input query. HAWK uses an eight-fold pipeline comprising part-of-speech tagging, entity annotation, dependency parsing, linguistic pruning heuristics for an in-depth analysis of the natural language input, semantic annotation of properties and classes, the generation of basic triple patterns for each component of the input query as well as discarding queries containing not connected query graphs and ranking them afterwards.

SWIP (Semantic Web intercase using Pattern) [118] generates a pivot query, a hybrid structure between the natural language question and the formal SPARQL target query. Generating the pivot queries consists of three main steps: (1) Named entity identification, (2) Query focus identification and (3) sub query generation. To formalize the pivot queries, the query is mapped to linguistic patterns, which are created by hand from domain experts. If there are multiple applicable linguistic patterns for a pivot query, the user chooses between them.

Hakimov et al. [73] adapt a semantic parsing algorithm to SQA which achieves a high performance but relies on large amounts of training data which is not practical when the domain is large or unspecified.

Several industry-driven SQA-related projects have emerged over the last years. For example, DeepQA of IBM Watson [71], which was able to win the Jeopardy! challenge against human experts.

YodaQA [15] is a modular open source hybrid approach built on top of the Apache UIMA framework13

¹³

https://uima.apache.org/.

that is part of the Brmson platform and is inspired by DeepQA. YodaQA allows easy parallelization and leverage og pre-existing NLP UIMA components by representing each artifact (question, search result, passage, candidate answer) as a separate UIMA CAS. Yoda pipeline is divided in five different stages: (1) Question Analysis, (2) Answer Production, (3) Answer Analysis, (4) Answer Merging and Scoring as well as (5) Successive Refining.

Further, KAIST’s Exobrain14

¹⁴

http://exobrain.kr/.

project aims to learn from large amounts of data while ensuring a natural interaction with end users. However, it is limited to Korean.

Answer presentation Another important part of SQA systems outside the SQA research challenges is result presentation. Verbose descriptions or plain URIs are uncomfortable for human reading. Entity summarization deals with different types and levels of abstractions.

Cheng et al. [ 34] proposes a random surfer model extended by a notion of centrality, i.e., a computation of the central elements involving similarity (or relatedness) between them as well as their informativeness. The similarity is given by a combination of the relatedness between their properties and their values.

Ngomo et al. [111] present another approach that automatically generates natural language description of resources using their attributes. The rationale behind SPARQL2NL is to verbalize15

¹⁵

For example, "123"ˆˆ<http://dbpedia.org/datatype/squareKilometre> can be verbalized as 123 square kilometres.

RDF data by applying templates together with the metadata of the schema itself (label, description, type). Entities can have multiple types as well as different levels of hierarchy which can lead to different levels of abstractions. The verbalization of the DBpedia entity dbr:Microsoft can vary depending on the type dbo:Agent rather than dbo:Company .

5. Challenges

In this section, we address seven challenges that have to be faced by state-of-the-art SQA systems. All mentioned challenges are currently open research fields. For each challenge, we describe efforts mentioned in the 72 selected publications. Challenges that affect SQA, but that are not to be solved by SQA systems, such as speech interfaces, data quality and system interoperability, are analyzed in Shekarpour et al. [130].

5.1. Lexical gap

In a natural language, the same meaning can be expressed in different ways. Natural language descriptions of RDF resources are provided by values of the rdfs:label property (label in the following). While synonyms for the same RDF resource can be modeled using multiple labels for that resource, knowledge bases typically do not contain all the different terms that can refer to a certain entity. If the vocabulary used in a question is different from the one used in the labels of the knowledge base, we call this the lexical gap16

¹⁶
In linguistics, the term lexical gap has a different meaning, referring to a word that has no equivalent in another language.

[73]. Because a question can usually only be answered if every referred concept is identified, bridging this gap significantly increases the proportion of questions that can be answered by a system. Table 4 shows the methods employed by the 72 selected publications for bridging the lexical gap along with examples. As an example of how the lexical gap is bridged outside of SQA, see Lee et al. [88].

String normalization and similarity functions Normalizations, such as conversion to lower case or to base forms, such as “é” to “e”, allow matching of slightly different forms and some simple mistakes, such as “Deja Vu” for “déjà vu”, and are quickly implemented and executed. More elaborate normalizations use natural language processing (NLP) techniques for stemming (both “running” and “ran” to “run”).

If normalizations are not enough, the distance – and its complementary concept, similarity – can be quantified using a similarity function and a threshold. Common examples of similarity functions are Jaro-Winkler, an edit-distance that measures transpositions, and n-grams, which compares sets of substrings of length n of two strings. Also, one of the surveyed publications, Zhang et al. [155], uses the largest common substring, both between Japanese and translated English words. However, applying such similarity functions can carry harsh performance penalties. While an exact string match can be efficiently executed in a SPARQL triple pattern, similarity scores generally need to be calculated between a phrase and every entity label, which is infeasible on large knowledge bases [144]. There are however efficient indexes for some similarity functions. For instance, the edit distances of two characters or less can be mitigated by using the fuzzy query implementation of a Lucene Index17

¹⁷

http://lucene.apache.org.

that implements a Levenshtein Automaton [123]. Furthermore, Ngomo [109] provides a different approach to efficiently calculating similarity scores that could be applied to QA. It uses similarity metrics where a triangle inequality holds that allows for a large portion of potential matches to be discarded early in the process. This solution is not as fast as using a Levenshtein Automaton but does not place such a tight limit on the maximum edit distance.

Table 4

Different techniques for bridging the lexical gap along with examples of deviations of the word “running” that these techniques cover

Identity	running
Similarity Measure	runnign
Stemming/Lemmatizing	run
AQE – Synonyms	sprint
Pattern libraries	X made a break for Y

Automatic query expansion While normalization and string similarity methods match different forms of the same word, they do not recognize synonyms. Synonyms, like design and plan, are pairs of words that, either always or only in a specific context, have the same meaning. In hyper-hyponym-pairs, like chemical process and photosynthesis, the first word is less specific then the second one. These word pairs, taken from lexical databases such as WordNet [102], are used as additional labels in Automatic query expansion (AQE). AQE is commonly used in information retrieval and traditional search engines, as summarized in Carpineto and Romano [32]. These additional surface forms allow for more matches and thus increase recall but lead to mismatches between related words and thus can decrease the precision.

In traditional document-based search engines with high recall and low precision, this trade-off is more common than in SQA. SQA is typically optimized for concise answers and a high precision, since a SPARQL query with an incorrectly identified concept mostly results in a wrong set of answer resources. However, AQE can be used as a backup method in case there is no direct match. One of the surveyed publications is an experimental study [128] that evaluates the impact of AQE on SQA. It has analyzed different lexical18

¹⁸

Lexical features include synonyms, hyper and hyponyms.

and semantic19

¹⁹

Semantic features making use of RDF graphs and the RDFS vocabulary, such as equivalent, sub- and superclasses.

expansion features and used machine learning to optimize weightings for combinations of them. Both lexical and semantic features were shown to be beneficial on a benchmark dataset consisting only of sentences where direct matching is not sufficient.

Pattern libraries RDF individuals can be matched from a phrase to a resource with high accuracy using similarity functions and normalization alone. Properties however require further treatment, as (1) they determine the subject and object, which can be in different positions20

²⁰

E.g., “X wrote Y” and “Y is written by X”.

and (2) a single property can be expressed in many different ways, both as a noun and as a verb phrase which may not even be a continuous substring21

²¹

E.g., “X wrote Y together with Z” for “X is a coauthor of Y”.

of the question. Because of the complex and varying structure of those linguistic patterns and the required reasoning and knowledge,22

²²

E.g., “if X writes a book, X is called the author of it.”

libraries to overcome this issues have been developed.

PATTY [107] detects entities in sentences of a corpus and determines the shortest path between the entities. The path is then expanded with occurring modifiers and stored as a pattern. Thus, PATTY is able to build up a pattern library on any knowledge base with an accompanying corpus.

BOA [69] generates linguistic patterns using a corpus and a knowledge base. For each property in the knowledge base, sentences from a corpus are chosen containing examples of subjects and objects for this particular property. BOA assumes that each resource pair that is connected in a sentence exemplifies another label for this relation and thus generates a pattern from each occurrence of that word pair in the corpus.

PARALEX [54] contains phrase to concept mappings in a lexicon that is trained from a corpus of paraphrases from the QA site WikiAnswers. The advantage is that no manual templates have to be created as they are automatically learned from the paraphrases.

Entailment A corpus of already answered questions or linguistic question patterns can be used to infer the answer for new questions. A phrase A is said to entail a phrase B, if B follows from A. Thus, entailment is directional: Synonyms entail each other, whereas hyper- and hyponyms entail in one direction only: “birds fly” entails “sparrows fly”, but not the other way around. Ou and Zhu [112] generate possible questions for an ontology in advance and identify the most similar match to a user question based on a syntactic and semantic similarity score. The syntactic score is the cosine-similarity of the questions using bag-of-words. The semantic score also includes hypernyms, hyponyms and denorminalizations based on WordNet [102]. While the preprocessing is algorithmically simple compared to the complex pipeline of NLP tools, the number of possible questions is expected to grow superlinearly with the size of the ontology so the approach is more suited to specific domain ontologies. Furthermore, the range of possible questions is quite limited which the authors aim to partially alleviate in future work by combining multiple basic questions into a complex question.

Document retrieval models Blanco et al. [20] adapt entity ranking models from traditional document retrieval algorithms to RDF data. The authors apply BM25 as well as the tf-idf ranking function to an index structure with different text fields constructed from the title, object URIs, property values and RDF inlinks. The proposed adaptation is shown to be both time efficient and qualitatively superior to other state-of-the-art methods in ranking RDF resources.

Composite approaches Elaborate approaches on bridging the lexical gap can have a high impact on the overall runtime performance of an SQA system. This can be partially mitigated by composing methods and executing each following step only if the one before did not return the expected results.

BELA [146] implements four layers. First, the question is mapped directly to the concept of the ontology using the index lookup. Second, the question is mapped based on Levenshtein distance to the ontology, if the Levenshtein distance of a word from the question and a property from an ontology exceed a certain threshold. Third, WordNet is used to find synonyms for a given word. Finally, BELA uses explicit semantic analysis (ESA) Gabrilovich and Markovitch [65]. The evaluation is carried out on the QALD 2 [143] test dataset and shows that the more simple steps, like index lookup and Levenshtein distance, had the most positive influence on answering questions so that many questions can be answered with simple mechanisms.

Park et al. [ 115] answer natural language questions via regular expressions and keyword queries with a Lucene-based index. Furthermore, the approach uses DBpedia [92] as well as their own triple extraction method on the English Wikipedia.

5.2. Ambiguity

Ambiguity is the phenomenon of the same phrase having different meanings; this can be structural and syntactic (like “flying planes”) or lexical and semantic (like “bank”). We distinguish between homonymy, where the same string accidentally refers to different concepts (as in money bank vs. river bank) and polysemy, where the same string refers to different but related concepts (as in bank as a company vs. bank as a building). We distinguish between synonymy and taxonomic relations such as metonymy and hypernymy. In contrast to the lexical gap, which impedes the recall of a SQA system, ambiguity negatively effects its precision. Ambiguity is the flipside of the lexical gap.

This problem is aggravated by the very methods used for overcoming the lexical gap. The more loose the matching criteria become (increase in recall), the more candidates are found which are generally less likely to be correct than closer ones. Disambiguation is the process of selecting one of multiple candidate concepts for an ambiguous phrase. We differentiate between two types of disambiguation based on the source and type of information used to solve this mapping:

Corpus-based methods are traditionally used and rely on counts, often used as probabilities, from unstructured text corpora. Such statistical approaches [132] are based on the distributional hypothesis, which states that “difference of meaning correlates with difference of [contextual] distribution” [76]. The context of a phrase is identified here as its central characteristic [103]. Common context features used are word co-occurrences, such as left or right neighbours, but also synonyms, hyponyms, POS-tags and the parse tree structure. More elaborate approaches also take advantage of the context outside of the question, such as past queries of the user [131].

In SQA, resource-based methods exploit the fact that the candidate concepts are RDF resources. Resources are compared using different scoring schemes based of their properties and the connections between them. The assumption is that high scores between all the resources chosen in the mapping implies a higher probability of those resources being related, and that this implies a higher propability of those resource being correctly chosen. RVT [70] uses Hidden Markov Models (HMM) to select the proper ontological triples according to the graph nature of DBpedia. CASIA [78] employs Markov Logic Networks (MLN): First-order logic statements are assigned a numerical penalty, which is used to define hard constraints, like “each phrase can map to only one resource”, alongside soft constraints, like “the larger the semantic similarity is between two resources, the higher the chance is that they are connected by a relation in the question”. Underspecification [139] discards certain combinations of possible meanings before the time consuming querying step, by combining restrictions for each meaning. Each term is mapped to a Dependency-based Underspecified Discourse REpresentation Structure (DUDE [36]), which captures its possible meanings along with their class restrictions. Treo [60,61] performs entity recognition and disambiguation using Wikipedia-based semantic relatedness and spreading activation. Semantic relatedness calculates similarity values between pairs of RDF resources. Determining semantic relatedness between entity candidates associated to words in a sentence allows to find the most probable entity by maximizing the total relatedness. EasyESA [33] is based on distributional semantic models which allow to represent an entity by a vector of target words and thus compresses its representation. The distributional semantic models allow to bridge the lexical gap and resolve ambiguity by avoiding the explicit structures of RDF-based entity descriptions for entity linking and relatedness. gAnswer [84] tackles ambiguity with RDF fragments, i.e., star-like RDF subgraphs. The number of connections between the fragments of the resource candidates is then used to score and select them. Wikimantic [22] can be used to disambiguate short questions or even sentences. It uses Wikipedia article interlinks for a generative model, where the probability of an article to generate a term is set to the terms’ relative occurrence in the article. Disambiguation is then an optimization problem to locally maximize each article’s (and thus DBpedia resource’s) term probability along with a global ranking method. Shekarpour et al. [125,126] disambiguate resource candidates using segments consisting of one or more words from a keyword query. The aim is to maximize the high textual similarity of keywords to resources along with relatedness between the resources (classes, properties and entities). The problem is cast as a Hidden Markov Model (HMM) with the states representing the set of candidate resources extended by OWL reasoning. The transition probabilities are based on the shortest path between the resources. The Viterbi algorithm generates an optimal path though the HMM that is used for disambiguation. DEANNA [150,151] manages phrase detection, entity recognition and entity disambiguation by formulating the SQA task as an integer linear programming (ILP) problem. It employs semantic coherence, which measures the co-occurrence of resources in the same context. DEANNA constructs a disambiguation graph, which encodes the selection of candidates for resources and properties. The chosen objective function maximizes the combined similarity while constraints guarantee that the selections are valid. The resulting problem is NP-hard but it is efficiently solvable in approximations by existing ILP solvers. The follow-up approach [152] uses DBpedia and Yago with a mapping of input queries to semantic relations based on text search. At QALD 2, it outperformed almost every other system on factoid questions and every other system on list questions. However, the approach requires detailed textual descriptions of entities and only creates basic graph pattern queries. LOD-Query [127] is a keyword-based SQA system that tackles both ambiguity and the lexical gap by selecting candidate concepts based on a combination of a string similarity score and the connectivity degree. The string similarity is the normalized edit distance between the labels and a keyword. The connectivity degree of a concept is approximated by the occurrence of that concept in all the triples of the knowledge base. Pomelo [74] answers biomedical questions on the combination of Drugbank, Diseasome and Sider using owl:sameAs links between them. Properties are disambiguated using predefined rewriting rules which are categorized by context. Rani et al. [121] use fuzzy logic co-clustering algorithms to retrieve documents based on their ontology similarity. Possible senses for a word are assigned a probability depending on the context. Zhang et al. [155] translates RDF resources to the English DBpedia. It uses feedback learning in the disambiguation step to refine the resource mapping.

Instead of trying to resolve ambiguity automatically, some approaches let the user clarify the exact intent, either in all cases or only for ambiguous phrases: SQUALL [56,57] defines a controlled, English-based, vocabulary that is enhanced with knowledge from a given triple store. While this ideally results in a high performance, it moves the problem of the lexical gap and disambiguation fully to the user. As such, it covers a middle ground between SPARQL and full-fledged SQA with the author’s intent that learning the grammatical structure of this proposed language is easier for a non-expert than to learn SPARQL. A cooperative approach that places less of a burden on the user is proposed in [96], which transforms the question into a discourse representation structure and starts a dialogue with the user for all occurring ambiguities. CrowdQ [48] is a SQA system that decomposes complex queries into simple parts (keyword queries) and uses crowdsourcing for disambiguation. It avoids excessive usage of crowd resources by creating general templates as an intermediate step. FREyA (Feedback, Refinement and Extended VocabularY Aggregation) [42] represents phrases as potential ontology concepts which are identified by heuristics on the syntactic parse tree. Ontology concepts are identified by matching their labels with phrases from the question without regarding its structure. A consolidation algorithm then matches both potential and ontology concepts. In case of ambiguities, feedback from the user is asked. Disambiguation candidates are created using string similarity in combination with WordNet synonym detection. The system learns from the user selections, thereby improving the precision over time. TBSL [142] uses both an domain independent and a domain dependent lexicon so that it performs well on a specific domain but is still adaptable to another one. It uses AutoSPARQL [89] to refine the learned SPARQL query using the QTL algorithm for supervised machine learning. The user marks certain answers as correct or incorrect and triggers a refinement. This is repeated until the user is satisfied with the result. An extension of TBSL is DEQA [91], which combines Web extraction with OXPath [64], interlinking with LIMES [110] and SQA with TBSL. It can thus answer complex questions about objects which are only available as HTML. Another extension of TBSL is ISOFT [114], which uses explicit semantic analysis to help bridging the lexical gap. NL-Graphs [53] combines SQA with an interactive visualization of the graph of triple patterns in the query which is close to the SPARQL query structure yet still intuitive to the user. Users that find errors in the query structure can either reformulate the query or modify the query graph. KOIOS [18] answers queries on natural environment indicators and allows the user to refine the answer to a keyword query by faceted search. Instead of relying on a given ontology, a schema index is generated from the triples and then connected with the keywords of the query. Ambiguity is resolved by user feedback on the top ranked results.

A different way to restrict the set of answer candidates and thus handle ambiguity is to determine the expected answer type of a factual question. The standard approach to determine this type is to identify the focus of the question and to map this type to an ontology class. In the example “Which books are written by Dan Brown?”, the focus is “books”, which is mapped to dbo:Book . There is however a long tail of rare answer types that are not as easily alignable to an ontology, which, for instance, Watson [71] tackles using the TyCor [87] framework for type coercion. Instead of the standard approach, candidates are first generated using multiple interpretations and then selected based on a combination of scores. Besides trying to align the answer type directly, it is coerced into other types by calculating the probability of an entity of class A to also be in class B. DBpedia, Wikipedia and WordNet are used to determine link anchors, list memberships, synonyms, hyper- and hyponyms. The follow-up [147] compares two different approaches for answer typing. Type-and-Generate (TaG) approaches restrict candidate answers to the expected answer types using predictive annotation, which requires manual analysis of a domain. Tycor on the other hand employs multiple strategies using generate-and-type (GaT), which generates all answers regardless of answer type and tries to coerce them into the expected answer type. Experimental results hint that GaT outperforms TaG when accuracy is higher than 50%. The significantly higher performance of TyCor when using GaT is explained by its robustness to incorrect candidates while there is no recovery from excluded answers from TaG.

5.3. Multilingualism

Knowledge on the Web is expressed in various languages. While RDF resources can be described in multiple languages at once using language tags, there is not a single language that is always used in Web documents. Additionally, users have different native languages. A more flexible approach is thus to have SQA systems that can handle multiple input languages, which may even differ from the language used to encode the knowledge. Deines and Krechel [46] use GermaNet [75] which is integrated into the multilingual knowledge base EuroWordNet [145] together with lemon-LexInfo [25], to answer German questions. Aggarwal et al. [2] only need to successfully translate part of the query, after which the recognition of the other entities is aided using semantic similarity and relatedness measures between resources connected to the initial ones in the knowledge base. QAKiS (Question Answering wiKiframework-based system) [39] automatically extends existing mappings between different language versions of Wikipedia, which is carried over to DBpedia.

5.4. Complex queries

Simple questions can most often be answered by translation into a set of simple triple patterns. Problems arise when several facts have to be found out, connected and then combined. Queries may also request a specific result order or results that are aggregated or filtered.

YAGO-QA [1] allows nested queries when the subquery has already been answered, for example “Who is the governor of the state of New York?” after “What is the state of New York?” YAGO-QA extracts facts from Wikipedia (categories and infoboxes), WordNet and GeoNames. It contains different surface forms such as abbreviations and paraphrases for named entities.

PYTHIA [140] is an ontology-based SQA system with an automatically build ontology-specific lexicon. Due to the linguistic representation, the system is able to answer natural language question with linguistically more complex queries, involving quantifiers, numerals, comparisons and superlatives, negations and so on.

IBM Watson [71] handles complex questions by first determining the focus element, which represents the searched entity. The information about the focus element is used to predict the lexical answer type and thus restrict the range of possible answers. This approach allows for indirect questions and multiple sentences.

Shekarpour et al. [125,126], as mentioned in Section 5.2, propose a model that use a combination of knowledge base concepts with a HMM model to handle complex queries.

Intui2 [49] is an SQA system based on DBpedia based on synfragments which map to a subtree of the syntactic parse tree. Semantically, a synfragment is a minimal span of text that can be interpreted as an RDF triple or complex RDF query. Synfragments interoperate with their parent synfragment by combining all combinations of child synfragments, ordered by syntactic and semantic characteristics. The authors assume that an interpretation of a question in any RDF query language can be obtained by the recursively interpretation of its synfragments. Intui3 [50] replaces self-made components with robust libraries such as the neural networks-based NLP toolkit SENNA and the DBpedia Lookup service. It drops the parser determined interpretation combination method of its predecessor that suffered from bad sentence parses and instead uses a fixed order right-to-left combination.

GETARUNS [47] first creates a logical form out of a query which consists of a focus, a predicate and arguments. The focus element identifies the expected answer type. For example, the focus of “Who is the major of New York?” is “person”, the predicate “be” and the arguments “major of New York”. If no focus element is detected, a yes/no question is assumed. In the second step, the logical form is converted to a SPARQL query by mapping elements to resources via label matching. The resulting triple patterns are then split up again as properties are referenced by unions over both possible directions, as in ({?x ?p ?o} UNION {?o ?p ?x}) because the direction is not known beforehand. Additionally, there are filters to handle additional restrictions which cannot be expressed in a SPARQL query, such as “Who has been the 5th president of the USA”.

5.5. Distributed knowledge

If concept information – which is referred to in a query – is represented by distributed RDF resources, information needed for answering it may be missing if only a single one or not all of the knowledge bases are found. In single datasets with a single source, such as DBpedia, however, most of the concepts have at most one corresponding resource. In case of combined datasets, this problem can be dealt with by creating sameAs, equivalentClass or equivalentProperty links, respectively. However, interlinking while answering a semantic query is a separate research area and thus not covered here.

Some questions are only answerable with multiple knowledge bases and we assume already created links for the sake of this survey. The ALOQUS [86] system tackles this problem by using the PROTON [43] upper level ontology first to phrase the queries. The ontology is than aligned to those of other knowledge bases using the BLOOMS [85] system. Complex queries are decomposed into separately handled subqueries after coreferences23

²³
Such as “List the Semantic Web people and their affiliation.”, where the coreferent their refers to the entity people.

are extracted and substituted. Finally, these alignments are used to execute the query on the target systems. In order to improve the speed and quality of the results, the alignments are filtered using a threshold on the confidence measure.

Herzig et al. [ 79] search for entities and consolidate results from multiple knowledge bases. Similarity metrics are used both to determine and rank results candidates of each datasource and to identify matches between entities from different datasources.

5.6. Procedural, temporal and spatial questions

Procedural questions Factual, list and yes-no questions are easiest to answer as they conform directly to SPARQL queries using SELECT and ASK. Others, such as why (causal) or how (procedural) questions require more additional processing. Procedural QA can currently not be solved by SQA, since, to the best of our knowledge, there are no existing knowledge bases that contain procedural knowledge. While it is not an SQA system, we describe the document-retrieval based KOMODO [29] to motivate further research in this area. Instead of an answer sentence, KOMODO returns a Web page with step-by-step instructions on how to reach the goal specified by the user. This reduces the problem difficulty as it is much easier to find a Web page which contains instructions on how to, for example, assemble an “Ikea Billy bookcase” than it would be to extract, parse and present the required steps to the user. Additionally, there are arguments explaining reasons for taking a step and warnings against deviation. Instead of extracting the sense of the question using an RDF knowledge base, KOMODO submits the question to a traditional search engine. The highest ranked returned pages are then cleaned and procedural text is identified using statistical distributions of certain POS tags.

In basic RDF, each fact, which is expressed by a triple, is assumed to be true, regardless of circumstances. In the real world and in natural language however, the truth value of many statements is not a constant but a function of either or both the location or time.

Temporal questions Tao et al. [137] answer temporal question on clinical narratives. They introduce the Clinical Narrative Temporal Relation Ontology (CNTRO), which is based on Allen’s Interval Based Temporal Logic [6] but allows usage of time instants as well as intervals. This allows inferring the temporal relation of events from those of others, for example by using the transitivity of before and after. In CNTRO, measurement, results or actions done on patients are modeled as events whose time is either absolutely specified in date and optionally time of day or alternatively in relations to other events and times. The framework also includes an SWRL [83] based reasoner that can deduce additional time information. This allows the detection of possible causalities, such as between a therapy for a disease and its cure in a patient.

Melo et al. [ 96] propose to include the implicit temporal and spatial context of the user in a dialog in order to resolve ambiguities. It also includes spatial, temporal and other implicit information.

QALL-ME [55] is a multilingual framework based on description logics and uses the spatial and temporal context of the question. If this context is not explicitly given, the location and time are of the user posing the question are added to the query. This context is also used to determine the language used for the answer, which can differ from the language of the question.

Spatial questions In RDF, a location can be expressed as 2-dimensional geocoordinates with latitude and longitude, while three-dimensional representations (e.g. with additional height) are not supported by the most often used schema.24

²⁴
See http://www.w3.org/2003/01/geo/wgs84_pos at http://lodstats.aksw.org.

Alternatively, spatial relationships can be modeled which are easier to answer as users typically ask for relationships and not exact geocoordinates.

Younis et al. [ 154] employ an inverted index for named entity recognition that enriches semantic data with spatial relationships such as crossing, inclusion and nearness. This information is then made available for SPARQL queries.

5.7. Templates

For complex questions, where the resulting SPARQL query contains more than one basic graph pattern, sophisticated approaches are required to capture the structure of the underlying query. Current research follows two paths, namely (1) template based approaches, which map input questions to either manually or automatically created SPARQL query templates or (2) template-free approaches that try to build SPARQL queries based on the given syntactic structure of the input question.

For the first solution, many (1) template-driven approaches have been proposed like TBSL [142] or SINA [125,126]. Furthermore, Casia [77] generates the graph pattern templates by using the question type, named entities and POS tags techniques. The generated graph patterns are then mapped to resources using WordNet, PATTY and similarity measures. Finally, the possible graph pattern combinations are used to build SPARQL queries. The system focuses in the generation of SPARQL queries that do not need filter conditions, aggregations and superlatives.

Ben Abacha and Zweigenbaum [16] focus on a narrow medical patients-treatment domain and use manually created templates alongside machine learning.

Damova et al. [44] return well formulated natural language sentences that are created using a template with optional parameters for the domain of paintings. Between the input query and the SPARQL query, the system places the intermediate step of a multilingual description using the Grammatical Framework [122], which enables the system to support 15 languages.

Rahoman and Ichise [120] propose a template based approach using keywords as input. Templates are automatically constructed from the knowledge base.

However, (2) template-free approaches require additional effort of making sure to cover every possible basic graph pattern [144]. Thus, only a few SQA systems tackle this approach so far.

Xser [149] first assigns semantic labels, i.e., variables, entities, relations and categories, to phrases by casting them to a sequence labelling pattern recognition problem which is then solved by a structured perceptron. The perceptron is trained using features including n-grams of POS tags, NER tags and words. Thus, Xser is capable of covering any complex basic graph pattern.

Going beyond SPARQL queries is TPSM, the open domain Three-Phases Semantic Mapping [68] framework. It maps natural language questions to OWL queries using Fuzzy Constraint Satisfaction Problems. Constraints include surface text matching, preference of POS tags and the similarity degree of surface forms. The set of correct mapping elements acquired using the FCSP-SM algorithm is combined into a model using predefined templates.

An extension of gAnswer [156] (see Section 5.2) is based on question understanding and query evaluation. First, their approach uses a relation mining algorithm to find triple patterns in queries as well as relation extraction, POS-tagging and dependency parsing. Second, the approach tries to find a matching subgraph for the extracted triples and scores them based on a confidence score. Finally, the top-k subgraph matches are returned. Their evaluation on QALD 3 shows that mapping NL questions to graph pattern is not as powerful as generating SPARQL (template) queries with respect to aggregation and filter functions needed to answer several benchmark input questions.

6. Conclusion

In this survey, we analyzed 62 systems and their contributions to seven challenges for SQA systems. SQA is an active research field with many existing and diverse approaches covering a multitude of research challenges, domains and knowledge bases.

We only cover QA on the Semantic Web, that is, approaches that retrieve resources as Linked Data from RDF knowledge bases. As similar challenges are faced by QA unrelated to the Semantic Web, we refer to Section 3. We choose to not go into detail for approaches that do not retrieve resources from RDF knowledge bases. Moreover, our consensus can be found in Table 6 for best practices. The upcoming HOBBIT25

²⁵

http://project-hobbit.eu/.

project will clarify, which modules can be aligned with state-of-the art performance and will quantify the impact of those modules. To cover the field of SQA in depth, we exluded works solely about similarity [14] or paraphrases [17]. The existence of common SQA challenges implies that a unifying architecture can improve the precision as well as increase the number of answered questions [95]. Research into such architectures, includes openQA [95], OAQA [153], QALL-ME [55] and QANUS [108] (see Section 3.3). Our goal, however, is not to quantify submodule performance or interplay. That will be the task of upcoming projects of large consortiums. A new community26

²⁶

https://www.w3.org/community/nli/.

is forming in that field and did not find a satisfying solution yet.27

²⁷

http://eis.iai.uni-bonn.de/blog/2015/11/.

In this section, we discuss each of the seven research challenges and give a short overview of already established as well as future research directions per challenge, see Table 6.

Table 5

Number of publications per year per addressed challenge. Percentages are given for the fully covered years 2011–2014 separately and for the whole covered timespan, with 1 decimal place. For a full list, see Table 7

Year	Total	Lexical Gap	Ambiguity	Multilingualism	Complex Operators	Distributed Knowledge	Procedural, Temporal or Spatial	Templates
		absolute
2010	1	0	0	0	0	0	1	0
2011	16	11	12	1	3	1	2	2
2012	14	6	7	1	2	1	1	4
2013	20	18	12	2	5	1	1	5
2014	13	7	8	1	2	0	1	0
2015	6	5	3	1	0	1	0	0
all	70	46	42	6	12	4	6	11
		percentage
2011		68.8	75.0	6.3	18.8	6.3	12.5	12.5
2012		42.9	50.0	7.1	14.3	7.1	7.1	28.6
2013		85.0	60.0	10.0	25.0	5.0	5.0	25.0
2014		53.8	61.5	7.7	15.4	7.7	7.7	0.0
all		65.7	60.0	8.6	17.1	5.7	8.6	15.7

Overall, the authors of this survey cannot observe a research drift to any of the challenges. The number of publications in a certain research challenge does not decrease significantly, which can be seen as an indicator that none of the challenges is solved yet – see Table 5. Naturally, since only a small number of publications addressed each challenge in a given year, one cannot draw statistically valid conclusions. The challenges proposed by Cimiano and Minock [37] and reduced within this survey appear to be still valid.

Table 6

Established and actively researched as well as envisioned techniques for solving each challenge

Challenge	Established	Future
Lexical Gap	stemming, lemmatization, string similarity, synonyms, vector space model, indexing, pattern libraries, explicit semantic analysis	combined efforts, reuse of libraries
Ambiguity	user information (history, time, location), underspecification, machine learning, spreading activation, semantic similarity, crowdsourcing, Markov Logic Network	holistic, knowledge-base aware systems
Multilingualism	translation to core language, language-dependent grammar	usage of multilingual knowledge bases
Complex Operators	reuse of former answers, syntactic tree-based formulation, answer type orientation, HMM, logic	non-factual questions, domain-independence
Distributed Knowledge and Procedural, Temporal, Spatial	temporal logic	domain specific adaptors, procedural SQA
Templates	fixed SPARQL templates, template generation, syntactic tree based generation	complex questions

Bridging the (1) lexical gap has to be tackled by every SQA system in order to retrieve results with a high recall. For named entities, this is commonly achieved using a combination of the reliable and mature natural language processing algorithms for string similarity and either stemming or lemmatization, see Table 6. Automatic Query Expansion (AQE), for example with WordNet synonyms, is prevalent in information retrieval but only rarely used in SQA. Despite its potential negative effects on precision,28

²⁸

Synonyms and other related words almost never have exactly the same meaning.

we consider it a net benefit to SQA systems. Current SQA systems duplicate already existing efforts or fail to decide on the right technique. Thus, reusable libraries to lower the entrance effort to SQA systems are needed. Mapping to RDF properties from verb phrases is much harder, as they show more variation and often occur at multiple places of a question. Pattern libraries, such as BOA [69], can improve property identification, however they are still an active research topic and are specific to a knowledge base.

Table 7

Surveyed publications from November 2010 to July of 2015, inclusive, along with the challenges they explicitely address and the approach or system they belong to. Additionally annotated is the use light expressions as well as the use of intermediate templates. In case the system or approach is not named in the publication, a name is generated using the last name of the first author and the year of the first included publication

Publication	System or approach	Year	Lexical Gap	Ambiguity	Multilingualism	Complex Operators	Distributed Knowledge	Procedural, Temporal or Spatial	Templates	QALD 1	QALD 2	QALD 3	QALD 4
Tao et al. [137]	Tao10	2010						✓
Adolphs et al. [1]	YAGO-QA	2011	✓	✓
Blanco et al. [20]	Blanco11	2011	✓
Canitrot et al. [29]	KOMODO	2011	✓	✓
Damljanovic et al. [42]	FREyA	2011	✓	✓						✓
Ferrandez et al. [55]	QALL-ME	2011			✓			✓
Freitas et al. [61]	Treo	2011	✓	✓		✓	✓
Gao et al. [68]	TPSM	2011	✓	✓
Kalyanpur et al. [87]	Watson	2011		✓
Melo et al. [96]	Melo11	2011	✓	✓
Moussa and Abdel-Kader [105]	QASYO	2011	✓						✓
Ou and Zhu [112]	Ou11	2011	✓	✓		✓			✓
Shen et al. [131]	Shen11	2011		✓
Unger and Cimiano [139]	Pythia	2011		✓
Unger and Cimiano [140]	Pythia	2011	✓	✓		✓		✓
Bicer et al. [18]	KOIOS	2011	✓	✓
Freitas et al. [60]	Treo	2011								✓
Ben Abacha and Zweigenbaum [16]	MM+/BIO-CRF-H	2012							✓
Boston et al. [22]	Wikimantic	2012		✓
Gliozzo and Kalyanpur [71]	Watson	2012				✓
Joshi et al. [86]	ALOQUS	2012				✓	✓
Lehmann et al. [91]	DEQA	2012	✓	✓					✓
Yahya et al. [150]	DEANNA	2012
Yahya et al. [151]	DEANNA	2012	✓	✓
Shekarpour et al. [126]	SINA	2012	✓	✓
Unger et al. [142]	TBSL	2012	✓	✓					✓
Walter et al. [146]	BELA	2012	✓	✓	✓				✓
Younis et al. [154]	Younis12	2012	✓					✓
Welty et al. [147]	Watson	2012		✓
Elbedweihy et al. [52]	Elbedweihy12	2012									✓
Cabrio et al. [26]	QAKiS	2012									✓
Demartini et al. [48]	CrowdQ	2013		✓		✓
Aggarwal et al. [ 2]	Aggarwal12	2013	✓	✓	✓
Deines and Krechel [46]	GermanNLI	2013	✓
Dima [ 49]	Intui2	2013	✓			✓
Fader et al. [ 54]	PARALEX	2013	✓	✓				✓	✓
Ferré [ 56]	SQUALL2SPARQL	2013	✓			✓						✓
Giannone et al. [ 70]	RTV	2013	✓									✓
Hakimov et al. [ 72]	Hakimov13	2013	✓	✓							✓
He et al. [ 77]	CASIA	2013	✓	✓		✓			✓			✓
Herzig et al. [ 79]	CRM	2013	✓	✓			✓
Huang and Zou [ 84]	gAnswer	2013	✓	✓
Pradel et al. [ 118]	SWIP	2013	✓									✓
Rahoman and Ichise [120]	Rahoman13	2013							✓
Shekarpour et al. [125]	SINA	2013	✓	✓					✓
Shekarpour et al. [128]	SINA	2013	✓
Shekarpour et al. [127]	SINA	2013	✓	✓					✓
Delmonte [ 47]	GETARUNS	2013	✓	✓		✓
Cojan et al. [ 39]	QAKiS	2013			✓
Yahya et al. [ 152]	SPOX	2013	✓	✓
Zhang et al. [ 155]	Kawamura13	2013	✓	✓
Carvalho et al. [ 33]	EasyESA	2014	✓	✓
Rani et al. [ 121]	Rani14	2014		✓
Zou et al. [ 156]	Zhou14	2014		✓
Frost et al. [ 63]	DEV-NLQ	2014				✓
Höffner and Lehmann [81]	CubeQA	2014	✓	✓				✓
Cabrio et al. [ 28]	QAKiS	2014			✓
Freitas and Curry [59]	Freitas14	2014	✓	✓
Dima [ 50]	Intui3	2014	✓			✓							✓
Hamon et al. [ 74]	POMELO	2014		✓									✓
Park et al. [ 114]	ISOFT	2014	✓										✓
He et al. [ 78]	CASIA	2014	✓	✓									✓
Xu et al. [ 149]	Xser	2014	✓										✓
Elbedweihy et al. [53]	NL-Graphs	2014		✓
Sun et al. [ 136]	QuASE	2015	✓	✓
Park et al. [ 115]	Park15	2015	✓	✓
Damova et al. [ 44]	MOLTO	2015			✓
Sun et al. [ 136]	QuASE	2015	✓	✓
Usbeck et al. [ 144]	HAWK	2015	✓				✓						✓
Hakimov et al. [ 73]	Hakimov15	2015	✓

The next challenge, (2) ambiguity is addressed by the majority of the publications but the percentage does not increase over time, presumably because of use cases with small knowledge bases, where its impact is minuscule. For systems intended for longtime usage by the same persons, we regard as promising the integration of previous questions, time and location, as is already common in web of document search engines. There is a variety of established disambiguation methods, which use the context of a phrase to determine the most likely RDF resource, some of which are based on unstructured text collections and others on RDF resources. As we could make out no clear winner, we recommend system developers to make their decisions based on the resources (such as query logs, ontologies, thesauri) available to them. Many approaches reinvent disambiguation efforts and thus – like for the lexical gap – holistic, knowledge-base aware, reusable systems are needed to facilitate faster research.

Despite its inclusion since QALD 3 and following, publications dealing with (3) multilingualism remain a small minority. Automatic translation of parts of or the whole query requires the least development effort, but suffers from imperfect translations. A higher quality can be achieved by using components, such as parsers and synonym libraries, for multiple languages. A possible future research direction is to make use of various language versions at once to use the power of a unified graph [39]. For instance, DBpedia [92] provides a knowledge base in more than 100 languages, which could form the base of a multilingual SQA system.

Complex operators (4) seem to be used only in specific tasks or factual questions. Most systems either use the syntactic structure of the question or some form of knowledge-base aware logic. Future research will be directed towards domain-independence as well as non-factual queries.

Approaches using (5) distributed knowledge as well as those incorporating (6) procedural, temporal and spatial data remain niches. Procedural SQA does not exist yet as present approaches return unstructured text in the form of already written step-by-step instructions. While we consider future development of procedural SQA as feasible with the existing techniques, as far as we know there is no RDF vocabulary for and knowledge base with procedural knowledge yet.

The (7) templates challenge which subsumes the question of mapping a question to a query structure is still unsolved. Although the development of template based approaches seems to have decreased in 2014, presumably because of their low flexibility on open domain tasks, this still presents the fastest way to develop a novel SQA system but the limitiation to simple query structures has yet to be overcome.

Future research should be directed at more modularization, automatic reuse, self-wiring and encapsulated modules with their own benchmarks and evaluations. Thus, novel research field can be tackled by reusing already existing parts and focusing on the research core problem itself. A step in this direction is QANARY [23], which describes how to modularize QA systems by providing a core QA vocabulary against which existing vocabularies are bound. Another research direction are SQA systems as aggregators or framework for other systems or algorithms to benefit of the set of existing approaches. Furthermore, benchmarking will move to single algorithmic modules instead of benchmarking a system as a whole. The target of local optimization is benchmarking a process at the individual steps, but global benchmarking is still needed to measure the impact of error propagation across the chain. A Turing test-like spirit would suggests that the latter is more important, as the local measure are never fully representative. Additionally, we foresee the move from factual benchmarks over common sense knowledge to more domain specific questions without purely factual answers. Thus, there is a movement towards multilingual, multi-knowledge-source SQA systems that are capable of understanding noisy, human natural language input.

Footnotes

Acknowledgements

This work has been supported by the FP7 project GeoKnow (GA No. 318159), the BMWI project SAKE (Project No. 01MD15006E) and by the Eurostars projects DIESEL (E!9367) and QAMEL (E!9725) as well as the European Union’s H2020 research and innovation action HOBBIT (GA 688227).

References

Adolphs,

Theobald,

Schäfer,

Uszkoreit and

Weikum, YAGO-QA: Answering questions by structured knowledge queries, in: Proceedings of the 5th IEEE International Conference on Semantic Computing (ICSC 2011), Palo Alto, CA, USA, September 18–21, 2011, IEEE Computer Society, Los Alamitos, USA, 2011, pp. 158–161. doi:10.1109/ICSC.2011.30.

Aggarwal,

Polajnar and

Buitelaar, Cross-lingual natural language querying over the Web of Data, in: Natural Language Processing and Information Systems – 18th International Conference on Applications of Natural Language to Information Systems, NLDB 2013, Proceedings, Salford, UK, June 19–21, 2013,

Métais,

Meziane,

Saraee,

Sugumaran and

Vadera, eds, Lecture Notes in Computer Science, Vol. 7934, Springer, Berlin, Heidelberg, Germany, 2013, pp. 152–163. doi:10.1007/978-3-642-38824-8_13.

Agichtein,

Carmel,

Pelleg,

Pinter and

Harman, Overview of the TREC 2015 LiveQA track, in: Proceedings of the Twenty-Fourth Text REtrieval Conference, TREC 2015, Gaithersburg, Maryland, USA, November 17–20, 2015,

E.M.

Voorhees and

Ellis, eds, Special Publication volume 500-319, National Institute of Standards and Technology (NIST), 2015.

Alani,

Kagal,

Fokoue,

P.T.

Groth,

Biemann,

J.X.

Parreira,

Aroyo,

N.F.

Noy,

Welty and

Janowicz (eds), The Semantic Web – ISWC 2013 – 12th International Semantic Web Conference, Proceedings, Part I, Sydney, NSW, Australia, October 21–25, 2013, Lecture Notes in Computer Science, Vol. 8218, Springer, 2013. doi:10.1007/978-3-642-41335-3.

Alani,

Kagal,

Fokoue,

P.T.

Groth,

Biemann,

J.X.

Parreira,

Aroyo,

N.F.

Noy,

Welty and

Janowicz (eds), The Semantic Web – ISWC 2013 – 12th International Semantic Web Conference, Proceedings, Part II, Sydney, NSW, Australia, October 21–25, 2013, Lecture Notes in Computer Science, Vol. 8219, Springer, 2013. doi:10.1007/978-3-642-41338-4.

J.F.

Allen, Maintaining knowledge about temporal intervals, Communications of the ACM 26 (1983), 832–843. doi:10.1145/182.358434.

Antoniou,

Grobelnik,

E.P.B.

Simperl,

Parsia,

Plexousakis,

P.D.

Leenheer and

J.Z.

Pan (eds), The Semantic Web: Research and Applications – 8th Extended Semantic Web Conference, ESWC 2011, Proceedings, Part I, Heraklion, Crete, Greece, May 29–June 2, 2011, Lecture Notes in Computer Science, Vol. 6643, Springer, 2011. doi:10.1007/978-3-642-21034-1.

Antoniou,

Grobelnik,

E.P.B.

Simperl,

Parsia,

Plexousakis,

P.D.

Leenheer and

J.Z.

Pan (eds), The Semanic Web: Research and Applications – 8th Extended Semantic Web Conference, ESWC 2011, Proceedings, Part II, Heraklion, Crete, Greece, May 29–June 2, 2011, Lecture Notes in Computer Science, Vol. 6644, Springer, 2011. doi:10.1007/978-3-642-21064-8.

Aroyo,

Welty,

Alani,

Taylor,

Bernstein,

Kagal,

N.F.

Noy and

Blomqvist (eds), The Semantic Web – ISWC 2011 – 10th International Semantic Web Conference, Proceedings, Part I, Bonn, Germany, October 23–27, 2011, Lecture Notes in Computer Science, Vol. 7031, Springer, 2011. doi:10.1007/978-3-642-25073-6.

10.

Aroyo,

Welty,

Alani,

Taylor,

Bernstein,

Kagal,

N.F.

Noy and

Blomqvist (eds), The Semantic Web – ISWC 2011 – 10th International Semantic Web Conference, Proceedings, Part II, Bonn, Germany, October 23–27, 2011, Lecture Notes in Computer Science, Vol. 7032, Springer, 2011. doi:10.1007/978-3-642-25093-4.

11.

Athenikos and

Han, Biomedical Question Answering: A survey, Computer Methods and Programs in Biomedicine 99 (2010), 1–24. doi:10.1016/j.cmpb.2009.10.003.

12.

Balikas,

Partalas,

A.N.

Ngomo,

Krithara and

Paliouras, Results of the BioASQ track of the Question Answering lab at CLEF 2014, in: Working Notes for CLEF 2014 Conference, Sheffield, UK, September 15–18, 2014,

Cappellato,

Ferro,

Halvey and

Kraaij, eds, CEUR Workshop Proceedings, Vol. 1180, CEUR-WS.org, 2014, pp. 1181–1193, http://ceur-ws.org/Vol-1180/CLEF2014wn-QA-BalikasEt2014.pdf .

13.

Balikas,

Kosmopoulos,

Krithara,

Paliouras and

I.A.

Kakadiaris, Results of the BioASQ tasks of the Question Answering lab at CLEF 2015, in: Working Notes of CLEF 2015 – Conference and Labs of the Evaluation Forum, Toulouse, France, September 8–11, 2015,

Cappellato,

Ferro,

G.J.F.

Jones and

SanJuan, eds, CEUR Workshop Proceedings, Vol. 1391, CEUR-WS.org, 2015, http://ceur-ws.org/Vol-1391/inv-pap7-CR.pdf .

14.

Bär,

Biemann,

Gurevych and

Zesch, Ukp: Computing semantic textual similarity by combining multiple content similarity measures, in: Proceedings of the First Joint Conference on Lexical and Computational Semantics, *SEM 2012, June 7–8, 2012, Montréal, Canada,

Agirre,

Bos and

M.T.

Diab, eds, Association for Computational Linguistics, 2012, pp. 435–440, http://aclweb.org/anthology/S/S12/S12-1059.pdf.

15.

Baudiš and

Šedivỳ, Modeling of the Question Answering task in the YodaQA system, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction – 6th International Conference of the CLEF Association, CLEF 2015, Proceedings, Toulouse, France, September 8–11, 2015,

Mothe,

Savoy,

Kamps,

Pinel-Sauvagnat,

G.J.F.

Jones,

SanJuan,

Cappellato and

Ferro, eds, Lecture Notes in Computer Science, Vol. 9283, Springer, 2015, pp. 222–228. doi:10.1007/978-3-319-24027-5_20.

16.

Ben Abacha and

Zweigenbaum, Medical Question Answering: Translating medical questions into SPARQL queries, in: ACM International Health Informatics Symposium, IHI’12, Miami, FL, USA, January 28–30, 2012,

Luo,

Liu and

C.C.

Yang, eds, ACM, 2012, pp. 41–50. doi:10.1145/2110363.2110372.

17.

Berant and

Liang, Semantic parsing via paraphrasing, in: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, Volume 1: Long Papers, Baltimore, MD, USA, June 22–27, 2014, The Association for Computer Linguistics, 2014, pp. 1415–1425. doi:10.3115/v1/P14-1133.

18.

Bicer,

Tran,

Abecker and

Nedkov, KOIOS: Utilizing semantic search for easy-access and visualization of structured environmental data, in: The Semantic Web – ISWC 2011 – 10th International Semantic Web Conference, Proceedings, Part II, Bonn, Germany, October 23–27, 2011,

Aroyo,

Welty,

Alani,

Taylor,

Bernstein,

Kagal,

N.F.

Noy and

Blomqvist, eds, Lecture Notes in Computer Science, Vol. 7032, Springer, 2011, pp. 1–16. doi:10.1007/978-3-642-25093-4_1.

19.

Biemann,

Handschuh,

Freitas,

Meziane and

Métais, Natural Language Processing and Information Systems: 20th International Conference on Applications of Natural Language to Information Systems, NLDB 2015, Proceedings, Passau, Germany, June 17–19, 2015, Lecture Notes in Computer Science, Vol. 9103, Springer Publishing, New York, USA, 2015. doi:10.1007/978-3-319-19581-0.

20.

Blanco,

Mika and

Vigna, Effective and efficient entity search in RDF data, in: The Semantic Web – ISWC 2011 – 10th International Semantic Web Conference, Proceedings, Part I, Bonn, Germany, October 23–27, 2011,

Aroyo,

Welty,

Alani,

Taylor,

Bernstein,

Kagal,

N.F.

Noy and

Blomqvist, eds, Lecture Notes in Computer Science, Vol. 7031, Springer, 2011, pp. 83–97. doi:10.1007/978-3-642-25073-6_6.

21.

Bordes,

Weston and

Usunier, Open Question Answering with weakly supervised embedding models, in: Machine Learning and Knowledge Discovery in Databases – European Conference, ECML PKDD 2014, Proceedings, Part I, Nancy, France, September 15–19, 2014,

Calders,

Esposito,

Hüllermeier and

Meo, eds, Lecture Notes in Computer Science, Vol. 8724, Springer, 2014, pp. 165–180. doi:10.1007/978-3-662-44848-9_11.

22.

Boston,

Carberry and

Fang, Wikimantic: Disambiguation for short queries, in: Natural Language Processing and Information Systems – 17th International Conference on Applications of Natural Language to Information Systems, NLDB 2012, Proceedings, Groningen, The Netherlands, June 26–28, 2012,

Bouma,

Ittoo,

Métais and

Wortmann, eds, Lecture Notes in Computer Science, Vol. 7337, Springer, 2012, pp. 140–151. doi:10.1007/978-3-642-31178-9_13.

23.

Both,

Diefenbach,

Singh,

Shekarpour,

Cherix and

Lange, Qanary – A methodology for vocabulary-driven open Question Answering systems, in: The Semantic Web. Latest Advances and New Domains – 13th International Conference, ESWC 2016, Proceedings, Heraklion, Crete, Greece, May 29–June 2, 2016,

Sack,

Blomqvist,

d’Aquin,

Ghidini,

S.P.

Ponzetto and

Lange, eds, Lecture Notes in Computer Science, Vol. 9678, Springer, 2016, pp. 625–641. doi:10.1007/978-3-319-34129-3_38.

24.

Bouma,

Ittoo,

Métais and

Wortmann (eds), Natural Language Processing and Information Systems: 17th International Conference on Applications of Natural Language to Information Systems, NLDB 2012, Proceedings, Groningen, The Netherlands, June 26–28, 2012, Lecture Notes in Computer Science, Vol. 7337, Springer, Berlin, Heidelberg, Germany, 2012. doi:10.1007/978-3-642-31178-9.

25.

Buitelaar,

Cimiano,

Haase and

Sintek, Towards linguistically grounded ontologies, in: The Semantic Web: Research and Applications, 6th European Semantic Web Conference, ESWC 2009,

Aroyo,

Traverso,

Ciravegna,

Cimiano,

Heath,

Hyvönen,

Mizoguchi,

Oren,

Sabou and

Simperl, eds, Lecture Notes in Computer Science, Vol. 6643, Springer, Berlin, Heidelberg, Germany, 2009, pp. 111–125. doi:10.1007/978-3-642-02121-3_12.

26.

Cabrio,

A.P.

Aprosio,

Cojan,

Magnini,

Gandon and

Lavelli, QAKiS @ QALD-2, in: Proceedings of the Workshop on Interacting with Linked Data (ILD 2012), Workshop Co-Located with the 9th Extended Semantic Web Conference, Heraklion, Greece, May 28, 2012,

Unger,

Cimiano,

López,

Motta,

Buitelaar and

Cyganiak, eds, CEUR Workshop Proceedings, Vol. 913, CEUR-WS.org, pp. 87–95, http://ceur-ws.org/Vol-913/07_ILD2012.pdf .

27.

Cabrio,

Cimiano,

López,

A.N.

Ngomo,

Unger and

Walter, QALD-3: Multilingual Question Answering over Linked Data, in: Working Notes for CLEF 2013 Conference, Valencia, Spain, September 23–26, 2013,

Forner,

Navigli,

Tufis and

Ferro, eds, CEUR Workshop Proceedings, Vol. 1179, CEUR-WS.org, 2013, http://ceur-ws.org/Vol-1179/CLEF2013wn-QALD3-CabrioEt2013.pdf .

28.

Cabrio,

Cojan,

Gandon and

Hallili, Querying multilingual DBpedia with QAKiS, in: The Semantic Web: ESWC 2013 Satellite Events – ESWC 2013 Satellite Events, Revised Selected Papers, Montpellier, France, May 26–30, 2013,

Cimiano,

Fernández,

López,

Schlobach and

Völker, eds, Lecture Notes in Computer Science, Vol. 7955, Springer, 2013, pp. 194–198. doi:10.1007/978-3-642-41242-4_23.

29.

Canitrot,

de Filippo,

P.-Y.

Roger and

Saint-Dizier, The KOMODO system: Getting recommendations on how to realize an action via Question-Answering, in: IJCNLP 2011, Proceedings of the KRAQ11 Workshop: Knowledge and Reasoning for Answering Questions,

Saint-Dizier,

Blache,

Kawtrakul,

Kongthon,

M.-F.

Moens and

Quarteroni, eds, Asian Federation of Natural Language Proceesing, 2011, pp. 1–9, http://www.aclweb.org/anthology/W/W11/W11-3101.pdf .

30.

Cappellato,

Ferro,

Halvey and

Kraaij (eds), Working Notes for CLEF 2014 Conference, Sheffield, UK, September 15–18, 2014, CEUR Workshop Proceedings, Vol. 1180, 2014, http://ceur-ws.org/Vol-1180 .

31.

Cappellato,

Ferro,

Halvey and

Kraaij (eds), Question Answering over Linked Data (QALD-4), CEUR Workshop Proceedings, Vol. 1180, CEUR-WS.org, 2014, http://ceur-ws.org/Vol-1180/CLEF2014wn-QA-UngerEt2014.pdf .

32.

Carpineto and

Romano, A survey of Automatic Query Expansion in information retrieval, ACM Computing Surveys 44 (2012), 1:1–1:50. doi:10.1145/2071389.2071390.

33.

D.S.

Carvalho,

Ç.

Çallı,

Freitas and

Curry, EasyESA: A low-effort infrastructure for explicit semantic analysis, in: Proceedings of the ISWC 2014 Posters & Demonstrations Track a Track Within the 13th International Semantic Web Conference, ISWC 2014, Riva del Garda, Italy, October 21, 2014,

Horridge,

Rospocher and

van Ossenbruggen, eds, CEUR Workshop Proceedings, Vol. 1272, CEUR-WS.org, 2014, pp. 177–180, http://ceur-ws.org/Vol-1272/paper_137.pdf .

34.

Cheng,

Tran and

Qu, RELIN: Relatedness and informativeness-based centrality for entity summarization, in: The Semantic Web – ISWC 2011 – 10th International Semantic Web Conference, Proceedings, Part I,

Aroyo,

Welty,

Alani,

Taylor,

Bernstein,

Kagal,

N.F.

Noy and

Blomqvist, eds, Lecture Notes in Computer Science, Vol. 7031, Springer, 2011, pp. 114–129. doi:10.1007/978-3-642-25073-6_8.

35.

Chung,

A.Z.

Broder,

Shim and

Suel (eds), 23rd International World Wide Web Conference, WWW ’14, Seoul, Republic of Korea, April 7–11, 2014, ACM, 2014, doi:10.1145/2566486.

36.

Cimiano, Flexible semantic composition with dudes, in: Proceedings of the Eighth International Conference on Computational Semantics, Association for Computational Linguistics, Stroudsburg, USA, 2009, pp. 272–276. doi:10.3115/1693756.1693786.

37.

Cimiano and

Minock, Natural language interfaces: What is the problem? – A data-driven quantitative analysis, in: Natural Language Processing and Information Systems, 15th International Conference on Applications of Natural Language to Information Systems, NLDB 2010, Lecture Notes in Computer Science, Vol. 6177, Springer, Berlin, Heidelberg, Germany, 2010, pp. 192–206. doi:10.1007/978-3-642-12550-8_16.

38.

Cimiano,

Ó.

Corcho,

Presutti,

Hollink and

Rudolph (eds), The Semantic Web: Semantics and Big Data, 10th International Conference, ESWC 2013, Proceedings, Montpellier, France, May 26–30, 2013, Lecture Notes in Computer Science, Vol. 7882, Springer, 2013, pp. 26–30. doi:10.1007/978-3-642-38288-8.

39.

Cojan,

Cabrio and

Gandon, Filling the gaps among DBpedia multilingual chapters for Question Answering, in: Web Science 2013 (Co-Located with ECRC), WebSci’13, Paris, France, May 2–4, 2013,

H.C.

Davis,

Halpin,

Pentland,

Bernstein and

L.A.

Adamic, eds, ACM, 2013, pp. 33–42. doi:10.1145/2464464.2464500.

40.

Cudré-Mauroux,

Heflin,

Sirin,

Tudorache,

Euzenat,

Hauswirth,

J.X.

Parreira,

Hendler,

Schreiber,

Bernstein and

Blomqvist (eds), The Semantic Web – ISWC 2012 – 11th International Semantic Web Conference, Proceedings, Part I, Boston, MA, USA, November 11–15, 2012, Lecture Notes in Computer Science, Vol. 7649, Springer, 2012. doi:10.1007/978-3-642-35176-1.

41.

Cudré-Mauroux,

Heflin,

Sirin,

Tudorache,

Euzenat,

Hauswirth,

J.X.

Parreira,

Hendler,

Schreiber,

Bernstein and

Blomqvist (eds), The Semantic Web – ISWC 2012 – 11th International Semantic Web Conference, Proceedings, Part II, Boston, MA, USA, November 11–15, 2012, Lecture Notes in Computer Science, Vol. 7650, Springer, 2012. doi:10.1007/978-3-642-35173-0.

42.

Damljanovic,

Agatonovic and

Cunningham, FREyA: An interactive way of querying Linked Data using natural language, in: Proceedings of the 1st Workshop on Question Answering over Linked Data (QALD-1), Co-Located with the 8th Extended Semantic Web Conference,

Unger,

Cimiano,

López and

Motta, eds, 2011, pp. 10–23.

43.

Damova,

Kiryakov,

Simov and

Petrov, Mapping the central LOD ontologies to PROTON upper-level ontology, in: Proceedings of the 5th International Workshop on Ontology Matching (OM-2010) Collocated with the 9th International Semantic Web Conference (ISWC-2010),

Shvaiko,

Euzenat,

Giunchiglia,

Stuckenschmidt,

Mao and

Cruz, eds, CEUR Workshop Proceedings, Vol. 689, 2010, http://ceur-ws.org/Vol-689/ .

44.

Damova,

Dannélls,

Enache,

Mateva and

Ranta, Multilingual natural language interaction with Semantic Web knowledge bases and Linked Open Data, in: Towards the Multilingual Semantic Web,

Buitelaar and

Cimiano, eds, Springer, 2014, pp. 211–226. doi:10.1007/978-3-662-43585-4_13.

45.

H.T.

Dang,

Kelly and

J.J.

Lin, Overview of the TREC 2007 Question Answering track, in: Proceedings of the Sixteenth Text REtrieval Conference, TREC 2007, Gaithersburg, Maryland, USA, November 5–9, 2007,

E.M.

Voorhees and

L.P.

Buckland, eds, Special Publication volume 500-274, National Institute of Standards and Technology (NIST), 2007.

46.

Deines and

Krechel, A German natural language interface for Semantic Search, in: Semantic Technology, Second Joint International Conference, JIST 2012, Proceedings, Nara, Japan, December 2–4, 2012,

Takeda,

Qu,

Mizoguchi and

Kitamura, eds, Lecture Notes in Computer Science, Vol. 7774, Springer, 2012, pp. 278–289. doi:10.1007/978-3-642-37996-3_19.

47.

Delmonte, Computational Linguistic Text Processing – Lexicon, Grammar, Parsing and Anaphora Resolution, Nova Science Publishers, New York, USA, 2008.

48.

Demartini,

Trushkowsky,

Kraska and

M.J.

Franklin, CrowdQ: Crowdsourced query understanding, in: CIDR 2013, Sixth Biennial Conference on Innovative Data Systems Research, Online Proceedings, Asilomar, CA, USA, January 6–9, 2013, www.cidrdb.org, 2013, http://www.cidrdb.org/cidr2013/Papers/CIDR13_Paper137.pdf .

49.

Dima, Intui2: A prototype system for Question Answering over Linked Data, in: Working Notes for CLEF 2013 Conference, Valencia, Spain, September 23–26, 2013,

Forner,

Navigli,

Tufis and

Ferro, eds, CEUR Workshop Proceedings, Vol. 1179, CEUR-WS.org, 2013, http://ceur-ws.org/Vol-1179/CLEF2013wn-QALD3-Dima2013.pdf .

50.

Dima, Answering natural language questions with Intui3, in: Working Notes for CLEF 2014 Conference, Sheffield, UK, September 15–18, 2014,

Cappellato,

Ferro,

Halvey and

Kraaij, eds, CEUR Workshop Proceedings, Vol. 1180, CEUR-WS.org, 2014, pp. 1201–1211, http://ceur-ws.org/Vol-1180/CLEF2014wn-QA-Dima2014.pdf .

51.

Ding,

T.W.

Finin,

Joshi,

Pan,

R.S.

Cost,

Peng,

Reddivari,

Doshi and

Sachs, Swoogle: A search and metadata engine for the Semantic Web, in: Proceedings of the 2004 ACM CIKM International Conference on Information and Knowledge Management, Washington, DC, USA, November 8–13, 2004,

D.A.

Grossman,

Gravano,

Zhai,

Herzog and

D.A.

Evans, eds, ACM, 2004, pp. 652–659. doi:10.1145/1031171.1031289.

52.

Elbedweihy,

S.N.

Wrigley and

Ciravegna, Improving semantic search using query log analysis, in: Proceedings of the Workshop on Interacting with Linked Data (ILD 2012), Workshop Co-Located with the 9th Extended Semantic Web Conference, Heraklion, Greece, May 28, 2012,

Unger,

Cimiano,

López,

Motta,

Buitelaar and

Cyganiak, eds, CEUR Workshop Proceedings, Vol. 913, CEUR-WS.org, pp. 61–74, http://ceur-ws.org/Vol-913/05_ILD2012.pdf .

53.

Elbedweihy,

Mazumdar,

S.N.

Wrigley and

Ciravegna, NL-graphs: A hybrid approach toward interactively querying semantic data, in: The Semantic Web: Trends and Challenges – 11th International Conference, ESWC 2014, Proceedings, Anissaras, Crete, Greece, May 25–29, 2014,

Presutti,

d’Amato,

Gandon,

d’Aquin,

Staab and

Tordai, eds, Lecture Notes in Computer Science, Vol. 8465, Springer, 2014, pp. 565–579. doi:10.1007/978-3-319-07443-6_38.

54.

Fader,

L.S.

Zettlemoyer and

Etzioni, Paraphrase-driven learning for open Question Answering, in: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013, Volume 1: Long Papers, Sofia, Bulgaria, 4–9 August, 2013, Association for Computational Linguistics, 2013, pp. 1608–1618, http://aclweb.org/anthology/P/P13/P13-1158.pdf .

55.

Ferrandez,

Spurk,

Kouylekov,

Dornescu,

Ferrandez,

Negri,

Izquierdo,

Tomas,

Orasan,

Neumann et al., The QALL-ME framework: A specifiable-domain multilingual Question Answering architecture, Journal of Web Semantics 9 (2011), 137–145. doi:10.1016/j.websem.2011.01.002.

56.

Ferré, SQUALL2SPARQL: A translator from controlled English to full SPARQL 1.1, in: Working Notes for CLEF 2013 Conference, Valencia, Spain, September 23–26, 2013,

Forner,

Navigli,

Tufis and

Ferro, eds, CEUR Workshop Proceedings, Vol. 1179, CEUR-WS.org, 2013, http://ceur-ws.org/Vol-1179/CLEF2013wn-QALD3-Ferre2013.pdf .

57.

Ferré, SQUALL: A controlled natural language as expressive as SPARQL 1.1, in: Natural Language Processing and Information Systems – 18th International Conference on Applications of Natural Language to Information Systems, NLDB 2013, Proceedings, Salford, UK, June 19–21, 2013,

Métais,

Meziane,

Saraee,

Sugumaran and

Vadera, eds, Lecture Notes in Computer Science, Vol. 7934, Springer, 2013, pp. 114–125. doi:10.1007/978-3-642-38824-8_10.

58.

Forner,

Navigli,

Tufis and

Ferro (eds), Working Notes for CLEF 2013 Conference, Valencia, Spain, September 23–26, 2013, CEUR Workshop Proceedings, Vol. 1179, 2013, http://ceur-ws.org/Vol-1179 .

59.

Freitas and

Curry, Natural language queries over heterogeneous Linked Data graphs: A distributional-compositional semantics approach, in: Proceedings of the 19th International Conference on Intelligent User Interfaces, Association for Computing Machinery, New York, USA, 2014, pp. 279–288. doi:10.1145/2557500.2557534.

60.

Freitas,

J.G.

Oliveira,

Curry,

O’Riain and

J.C.P.

da Silva, Treo: Combining entity-search, spreading activation and semantic relatedness for querying Linked Data, in: Proceedings of the 1st Workshop on Question Answering over Linked Data (QALD-1), Co-Located with the 8th Extended Semantic Web Conference,

Unger,

Cimiano,

López and

Motta, eds, 2011, pp. 24–37.

61.

Freitas,

J.G.

Oliveira,

O’Riain,

Curry and

J.C.P.

Da Silva, Querying Linked Data using semantic relatedness: A vocabulary independent approach, in: Natural Language Processing and Information Systems, 16th International Conference on Applications of Natural Language to Information Systems, NLDB 2011,

Munoz,

Montoyo and

Metais, eds, Lecture Notes in Computer Science, Vol. 6716, Springer, Berlin, Heidelberg, Germany, 2011, pp. 40–51. doi:10.1016/j.datak.2013.08.003.

62.

Freitas,

Curry,

Oliveira and

O’Riain, Querying heterogeneous datasets on the Linked Data Web: Challenges, approaches, and trends, IEEE Internet Computing 16 (2012), 24–33. doi:10.1109/MIC.2011.141.

63.

R.A.

Frost,

J.A.

Donais,

Matthews,

Agboola and

R.J.

Stewart, A demonstration of a natural language query interface to an event-based semantic web triplestore, in: The Semantic Web: ESWC 2014 Satellite Events – ESWC 2014 Satellite Events, Revised Selected Papers, Anissaras, Crete, Greece, May 25–29, 2014,

Presutti,

Blomqvist,

Troncy,

Sack,

Papadakis and

Tordai, eds, Lecture Notes in Computer Science, Vol. 8798, Springer, 2014, pp. 343–348. doi:10.1007/978-3-319-11955-7_46.

64.

Furche,

Gottlob,

Grasso,

Schallhart and

Sellers, OXPath: A language for scalable data extraction, automation, and crawling on the Deep Web, The VLDB Journal 22 (2013), 47–72. doi:10.1007/s00778-012-0286-6.

65.

Gabrilovich and

Markovitch, Computing semantic relatedness using Wikipedia-based explicit semantic analysis, in: IJCAI 2007, Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, January 6–12, 2007,

M.M.

Veloso, ed., Vol. 7, AAAI Press, Palo Alto, USA, 2007, pp. 1606–1611, http://ijcai.org/Proceedings/07/Papers/259.pdf .

66.

Gandon,

Sabou,

Sack,

d’Amato,

Cudré-Mauroux and

Zimmermann (eds), The Semantic Web, Latest Advances and New Domains – 12th European Semantic Web Conference, ESWC 2015, Proceedings, Portoroz, Slovenia, May 31–June 4, 2015, Lecture Notes in Computer Science, Vol. 9088, Springer, 2015. doi:10.1007/978-3-319-18818-8.

67.

Gangemi,

Leonardi and

Panconesi (eds), Proceedings of the 24th International Conference on World Wide Web, WWW 2015, Florence, Italy, May 18–22, 2015, ACM, 2015, doi:10.1145/2736277.

68.

Gao,

Liu,

Zhong,

Chen and

Liu, Semantic mapping from natural language questions to OWL queries, Computational Intelligence 27 (2011), 280–314. doi:10.1111/j.1467-8640.2011.00382.x.

69.

Gerber and

A.N.

Ngomo, Extracting multilingual natural-language patterns for RDF predicates, in: Knowledge Engineering and Knowledge Management – 18th International Conference, EKAW 2012, Proceedings, Galway City, Ireland, October 8–12, 2012,

ten Teije,

Völker,

Handschuh,

Stuckenschmidt,

d’Aquin,

Nikolov,

Aussenac-Gilles and

Hernandez, eds, Lecture Notes in Computer Science, Vol. 7603, Springer, Berlin, Heidelberg, Germany, 2012, pp. 87–96. doi:10.1007/978-3-642-33876-2_10.

70.

Giannone,

Bellomaria and

Basili, A HMM-based approach to Question Answering against Linked Data, in: Working Notes for CLEF 2013 Conference, Valencia, Spain, September 23–26, 2013,

Forner,

Navigli,

Tufis and

Ferro, eds, CEUR Workshop Proceedings, Vol. 1179, CEUR-WS.org, 2013, http://ceur-ws.org/Vol-1179/CLEF2013wn-QALD3-GiannoneEt2013.pdf .

71.

A.M.

Gliozzo and

Kalyanpur, Predicting lexical answer types in open domain QA, International Journal on Semantic Web and Information Systems 8(3) (2012), 74–88. doi:10.4018/jswis.2012070104.

72.

Hakimov,

Tunc,

Akimaliev and

Dogdu, Semantic Question Answering system over Linked Data using relational patterns, in: Joint 2013 EDBT/ICDT Conferences, EDBT/ICDT’13, Workshop Proceedings, Genoa, Italy, March 22, 2013,

Guerrini, ed., Association for Computing Machinery, New York, USA, 2013, pp. 83–88. doi:10.1145/2457317.2457331.

73.

Hakimov,

Unger,

Walter and

Cimiano, Applying semantic parsing to Question Answering over Linked Data: Addressing the lexical gap, in: Natural Language Processing and Information Systems: 20th International Conference on Applications of Natural Language to Information Systems, NLDB 2015, Proceedings, Passau, Germany, June 17–19, 2015,

Biemann,

Handschuh,

Freitas,

Meziane and

Métais, eds, Springer Publishing, Cham, Switzerland, 2015, pp. 103–109. doi:10.1007/978-3-319-19581-0_8.

74.

Hamon,

Grabar,

Mougin and

Thiessard, Description of the POMELO system for the task 2 of QALD-4, in: Working Notes for CLEF 2014 Conference, Sheffield, UK, September 15–18, 2014,

Cappellato,

Ferro,

Halvey and

Kraaij, eds, CEUR Workshop Proceedings, Vol. 1180, CEUR-WS.org, 2014, pp. 1212–1223, http://ceur-ws.org/Vol-1180/CLEF2014wn-QA-HamonEt2014.pdf.

75.

Hamp and

Feldweg, GermaNet – A lexical-semantic net for German, in: Proceedings of the ACL/EACL’97 Workshop on Intelligent Scalable Text Summarization,

Mani and

Maybury, eds, Association for Computational Linguistics, Stroudsburg, USA, 1997, pp. 9–15.

76.

Z.S.

Harris, Papers in Structural and Transformational Linguistics, Springer, Berlin, Heidelberg, Germany, 2013. doi:10.1007/978-94-017-6059-1.

77.

He,

Liu,

Chen,

Zhou,

Liu and

Zhao, CASIA@QALD-3: A Question Answering system over Linked Data, in: Working Notes for CLEF 2013 Conference, Valencia, Spain, September 23–26, 2013,

Forner,

Navigli,

Tufis and

Ferro, eds, CEUR Workshop Proceedings, Vol. 1179, CEUR-WS.org, 2013, http://ceur-ws.org/Vol-1179/CLEF2013wn-QALD3-HeEt2013.pdf .

78.

He,

Zhang,

Liu and

Zhao, CASIA@V2: A MLN-based Question Answering system over Linked Data, in: Working Notes for CLEF 2014 Conference, Sheffield, UK, September 15–18, 2014,

Cappellato,

Ferro,

Halvey and

Kraaij, eds, CEUR Workshop Proceedings, Vol. 1180, CEUR-WS.org, 2014, pp. 1249–1259, http://ceur-ws.org/Vol-1180/CLEF2014wn-QA-ShizhuEt2014.pdf .

79.

D.M.

Herzig,

Mika,

Blanco and

Tran, Federated entity search using on-the-fly consolidation, in: The Semantic Web – ISWC 2013 – 12th International Semantic Web Conference, Proceedings, Part I, Sydney, NSW, Australia, October 21–25, 2013,

Alani,

Kagal,

Fokoue,

P.T.

Groth,

Biemann,

J.X.

Parreira,

Aroyo,

N.F.

Noy,

Welty and

Janowicz, eds, Lecture Notes in Computer Science, Vol. 8218, Springer, 2013, pp. 167–183. doi:10.1007/978-3-642-41335-3_11.

80.

Hirschman and

Gaizauskas, Natural language Question Answering: The view from here, Natural Language Engineering 7 (2001), 275–300. doi:10.1017/S1351324901002807.

81.

Höffner and

Lehmann, Towards Question Answering on statistical Linked Data, in: Proceedings of the 10th International Conference on Semantic Systems, SEMANTICS 2014, Leipzig, Germany, September 4–5, 2014,

Sack,

Filipowska,

Lehmann and

Hellmann, eds, Association for Computing Machinery, New York, USA, 2014, pp. 61–64. doi:10.1145/2660517.2660521.

82.

Höffner,

Lehmann and

Usbeck, Cubeqa – Question Answering on RDF data cubes, in: The Semantic Web – ISWC 2016 – 15th International Semantic Web Conference, Proceedings, Part I, Kobe, Japan, October 17–21, 2016,

P.T.

Groth,

Simperl,

A.J.G.

Gray,

Sabou,

Krötzsch,

Lécué,

Flöck and

Gil, eds, Lecture Notes in Computer Science, Vol. 9981, 2016, pp. 325–340. doi:10.1007/978-3-319-46523-4_20.

83.

Horrocks,

P.F.

Patel-Schneider,

Boley,

Tabet,

Grosof and

Dean, SWRL: A Semantic Web rule language combining OWL and RuleML, W3C Member Submission 21 May 2004, W3C, 2004, http://www.w3.org/Submission/SWRL.

84.

Huang and

Zou, Natural language Question Answering over RDF data, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22–27, 2013,

K.A.

Ross,

Srivastava and

Papadias, eds, Association for Computing Machinery, New York, USA, 2013, pp. 1289–1290. doi:10.1145/2463676.2463725.

85.

Jain,

Hitzler,

A.P.

Sheth,

Verma and

P.Z.

Yeh, Ontology alignment for Linked Open Data, in: The Semantic Web – ISWC 2010 – 9th International Semantic Web Conference, ISWC 2010, Revised Selected Papers, Part I, Shanghai, China, November 7–11, 2010,

P.F.

Patel-Schneider,

Pan,

Hitzler,

Mika,

Zhang,

J.Z.

Pan,

Horrocks and

Glimm, eds, Lecture Notes in Computer Science, Vol. 6496, Springer, 2010, pp. 402–417. doi:10.1007/978-3-642-17746-0_26.

86.

A.K.

Joshi,

Jain,

Hitzler,

P.Z.

Yeh,

Verma,

A.P.

Sheth and

Damova, Alignment-based querying of Linked Open Data, in: On the Move to Meaningful Internet Systems: OTM 2012, Confederated International Conferences: CoopIS, DOA-SVI, and ODBASE 2012, Proceedings, Part II, Rome, Italy, September 10–14, 2012,

Meersman,

Panetto,

T.S.

Dillon,

Rinderle-Ma,

Dadam,

Zhou,

Pearson,

Ferscha,

Bergamaschi and

I.F.

Cruz, eds, Lecture Notes in Computer Science, Vol. 7566, Springer, 2012, pp. 807–824. doi:10.1007/978-3-642-33615-7_25.

87.

Kalyanpur,

J.W.

Murdock,

Fan and

C.A.

Welty, Leveraging community-built knowledge for type coercion in Question Answering, in: The Semantic Web – ISWC 2011 – 10th International Semantic Web Conference, Proceedings, Part II, Bonn, Germany, October 23–27, 2011,

Aroyo,

Welty,

Alani,

Taylor,

Bernstein,

Kagal,

N.F.

Noy and

Blomqvist, eds, Lecture Notes in Computer Science, Vol. 7032, Springer, 2011, pp. 144–156. doi:10.1007/978-3-642-25093-4_10.

88.

Lee,

Kim,

Song and

Rim, Bridging lexical gaps between queries and questions on large online Q&A collections with compact translation models, in: 2008 Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, Proceedings of the Conference, A Meeting of SIGDAT, a Special Interest Group of the ACL, Honolulu, Hawaii, USA, 25–27 October, 2008, ACL, 2008, pp. 410–418. doi:10.3115/1613715.1613768.

89.

Lehmann and

Bühmann, AutoSPARQL: Let users query your knowledge base, in: The Semantic Web: Research and Applications – 8th Extended Semantic Web Conference, ESWC 2011, Proceedings, Part I, Heraklion, Crete, Greece, May 29–June 2, 2011,

Antoniou,

Grobelnik,

E.P.B.

Simperl,

Parsia,

Plexousakis,

P.D.

Leenheer and

J.Z.

Pan, eds, Lecture Notes in Computer Science, Vol. 6643, Springer, 2011, pp. 63–79. doi:10.1007/978-3-642-21034-1_5.

90.

Lehmann,

Bizer,

Kobilarov,

Auer,

Becker,

Cyganiak and

Hellmann, DBpedia – A crystallization point for the web of data, Journal of Web Semantics 7(3) (2009), 154–165. doi:10.1016/j.websem.2009.07.002.

91.

Lehmann,

Furche,

Grasso,

A.-C.

Ngonga Ngomo,

Schallhart,

Sellers,

Unger,

Bühmann,

Gerber,

Höffner,

Liu and

Auer, DEQA: Deep web extraction for Question Answering, in: The Semantic Web – ISWC 2012 – 11th International Semantic Web Conference, Proceedings, Part II, Boston, MA, USA, November 11–15, 2012,

Cudré-Mauroux,

Heflin,

Sirin,

Tudorache,

Euzenat,

Hauswirth,

J.X.

Parreira,

Hendler,

Schreiber,

Bernstein and

Blomqvist, eds, Lecture Notes in Computer Science, Vol. 7650, Springer, Berlin Heidelberg, Germany, 2012, pp. 131–147. doi:10.1007/978-3-642-35173-0_9.

92.

Lehmann,

Isele,

Jakob,

Jentzsch,

Kontokostas,

P.N.

Mendes,

Hellmann,

Morsey,

van Kleef,

Auer and

Bizer, DBpedia – A large-scale, multilingual knowledge base extracted from Wikipedia, Semantic Web 6(2) (2015), 167–195. doi:10.3233/SW-140134.

93.

López,

V.S.

Uren,

Sabou and

Motta, Is Question Answering fit for the Semantic Web?: A survey, Semantic Web 2(2) (2011), 125–155. doi:10.3233/SW-2011-0041.

94.

López,

Unger,

Cimiano and

Motta, Evaluating Question Answering over Linked Data, Journal of Web Semantics 21 (2013), 3–13. doi:10.1016/j.websem.2013.05.006.

95.

Marx,

Usbeck,

A.N.

Ngomo,

Höffner,

Lehmann and

Auer, Towards an open Question Answering architecture, in: Proceedings of the 10th International Conference on Semantic Systems, SEMANTICS 2014, Leipzig, Germany, September 4–5, 2014,

Sack,

Filipowska,

Lehmann and

Hellmann, eds, ACM, 2014, pp. 57–60. doi:10.1145/2660517.2660519.

96.

Melo,

I.P.

Rodrigues and

V.B.

Nogueira, Cooperative Question Answering for the Semantic Web, in: Actas Das Jornadas de Informática da Universidade de Évora 2011,

Rato and

T.G.

Calves, eds, Escola de Ciências e Tecnologia, Universidade de Évora, 2011, pp. 1–6.

97.

Métais,

Meziane,

Sararee,

Sugumaran and

Vadera (eds), Natural Language Processing and Information Systems: 18th International Conference on Applications of Natural Language to Information Systems, NLDB 2013, Proceedings, Salford, UK, Lecture Notes in Computer Science, Vol. 7934, Springer, Berlin, Heidelberg, Germany, 2013. doi:10.1007/978-3-642-38824-8.

98.

Métais,

Roche and

Teisseire (eds), Natural Language Processing and Information Systems: 19th International Conference on Applications of Natural Language to Information Systems, NLDB 2014, Proceedings, Montpellier, France, June 18–20, 2014, Lecture Notes in Computer Science, Vol. 8455, Springer Publishing, New York, USA, 2014. doi:10.1007/978-3-319-07983-7.

99.

Mika,

Tudorache,

Bernstein,

Welty,

C.A.

Knoblock,

Vrandecic,

P.T.

Groth,

N.F.

Noy,

Janowicz and

C.A.

Goble (eds), The Semantic Web – ISWC 2014 – 13th International Semantic Web Conference, Proceedings, Part I, Riva del Garda, Italy, October 19–23, 2014, Lecture Notes in Computer Science, Vol. 8796, Springer, 2014. doi:10.1007/978-3-319-11915-1.

100.

Mika,

Tudorache,

Bernstein,

Welty,

C.A.

Knoblock,

Vrandecic,

P.T.

Groth,

N.F.

Noy,

Janowicz and

C.A.

Goble (eds), The Semantic Web – ISWC 2014 – 13th International Semantic Web Conference, Proceedings, Part II, Riva del Garda, Italy, October 19–23, 2014, Lecture Notes in Computer Science, Vol. 8797, Springer, 2014. doi:10.1007/978-3-319-11915-1.

101.

Mille,

F.L.

Gandon,

Misselis,

Rabinovich and

Staab (eds), Proceedings of the 21st World Wide Web Conference 2012, WWW 2012, Lyon, France, April 16–20, 2012, ACM, 2012, doi:10.1145/2187836.

102.

G.A.

Miller, WordNet: A lexical database for english, Communications of the ACM 38(11) (1995), 39–41. doi:10.1145/219717.219748.

103.

G.A.

Miller and

W.G.

Charles, Contextual correlates of semantic similarity, Language and Cognitive Processes 6 (1991), 1–28. doi:10.1080/01690969108406936.

104.

Mishra and

S.K.

Jain, A survey on Question Answering systems with classification, Journal of King Saud University-Computer and Information Sciences (2015).

105.

A.M.

Moussa and

R.F.

Abdel-Kader, QASYO: A Question Answering system for YAGO ontology, International Journal of Database Theory and Application 4 (2011).

106.

Muñoz,

Montoyo and

Metais (eds), Natural Language Processing and Information Systems: 16th International Conference on Applications of Natural Language to Information Systems, NLDB 2011, Proceedings, Alicante, Spain, June 28–30, 2011, Lecture Notes in Computer Science, Vol. 6716, Springer, Berlin, Heidelberg, Germany, 2011. doi:10.1007/978-3-642-22327-3.

107.

Nakashole,

Weikum and

F.M.

Suchanek, PATTY: A taxonomy of relational patterns with semantic types, in: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012, Jeju Island, Korea, July 12–14, 2012,

Tsujii,

Henderson and

Pasca, eds, ACL, 2012, pp. 1135–1145, http://www.aclweb.org/anthology/D12-1104 .

108.

Ng and

Kan, QANUS: An open-source Question-Answering platform, CoRR (2015), http://arxiv.org/abs/1501.00311.

109.

A.N.

Ngomo, Link discovery with guaranteed reduction ratio in affine spaces with Minkowski measures, in: The Semantic Web – ISWC 2012 – 11th International Semantic Web Conference, Proceedings, Part I, Boston, MA, USA, November 11–15, 2012,

Cudré-Mauroux,

Heflin,

Sirin,

Tudorache,

Euzenat,

Hauswirth,

J.X.

Parreira,

Hendler,

Schreiber,

Bernstein and

Blomqvist, eds, Lecture Notes in Computer Science, Vol. 7649, Springer, 2012, pp. 378–393. doi:10.1007/978-3-642-35176-1_24.

110.

A.N.

Ngomo and

Auer, LIMES – A time-efficient approach for large-scale link discovery on the web of data, in: IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16–22, 2011,

Walsh, ed., IJCAI/AAAI, 2011, pp. 2312–2317. doi:10.5591/978-1-57735-516-8/IJCAI11-385.

111.

A.N.

Ngomo,

Bühmann,

Unger,

Lehmann and

Gerber, SPARQL2NL – Verbalizing SPARQL queries, in: 22nd International World Wide Web Conference, WWW’13, Companion Volume, Rio de Janeiro, Brazil, May 13–17, 2013,

Carr,

A.H.F.

Laender,

B.F.

Lóscio,

King,

Fontoura,

Vrandecic,

Aroyo,

J.P.M.

de Oliveira,

Lima and

Wilde, eds, International World Wide Web Conferences Steering Committee/ACM, 2013, pp. 329–332. doi:10.1145/2487788.2487936.

112.

Ou and

Zhu, An entailment-based Question Answering system over Semantic Web data, in: Digital Libraries: For Cultural Heritage, Knowledge Dissemination, and Future Creation – 13th International Conference on Asia-Pacific Digital Libraries, ICADL 2011, Proceedings, Beijing, China, October 24–27, 2011,

Xing,

Crestani and

Rauber, eds, Lecture Notes in Computer Science, Vol. 7008, Springer, 2011, pp. 311–320. doi:10.1007/978-3-642-24826-9_39.

113.

Paliouras and

A.-C.

Ngonga Ngomo (eds), BioASQ 2013 – Biomedical Semantic Indexing and Question Answering, CEUR Workshop Proceedings, Vol. 1094, 2013, http://ceur-ws.org/Vol-1094/, Online Working Notes.

114.

Park,

Shim and

G.G.

Lee, ISOFT at QALD-4: Semantic similarity-based Question Answering system over Linked Data, in: Working Notes for CLEF 2014 Conference, Sheffield, UK, September 15–18, 2014,

Cappellato,

Ferro,

Halvey and

Kraaij, eds, CEUR Workshop Proceedings, Vol. 1180, CEUR-WS.org, 2014, pp. 1236–1248, http://ceur-ws.org/Vol-1180/CLEF2014wn-QA-ParkEt2014.pdf .

115.

Park,

Kwon,

Kim,

Han,

Shim and

G.G.

Lee, Question Answering system using multiple information source and open type answer merge, in: NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31–June 5, 2015

Mihalcea,

J.Y.

Chai and

Sarkar, eds, Association for Computational Linguistics, 2015, pp. 111–115, doi:10.3115/v1/N15-3023.

116.

P.F.

Patel-Schneider,

Pan,

Hitzler,

Mika,

Zhang,

J.Z.

Pan,

Horrocks and

Glimm (eds), The Semantic Web – ISWC 2010 – 9th International Semantic Web Conference, ISWC 2010, Revised Selected Papers, Part I, Shanghai, China, November 7–11, 2010, Lecture Notes in Computer Science, Vol. 6496, Springer, 2010. doi:10.1007/978-3-642-17746-0.

117.

P.F.

Patel-Schneider,

Pan,

Hitzler,

Mika,

Zhang,

J.Z.

Pan,

Horrocks and

Glimm (eds), The Semantic Web – ISWC 2010 – 9th International Semantic Web Conference, ISWC 2010, Revised Selected Papers, Part II, Shanghai, China, November 7–11, 2010, Lecture Notes in Computer Science, Vol. 6497, Springer, 2010. doi:10.1007/978-3-642-17749-1.

118.

Pradel,

Peyet,

Haemmerle and

Hernandez, SWIP at QALD-3: Results, criticisms and lesson learned, in: Working Notes for CLEF 2013 Conference, Valencia, Spain, September 23–26, 2013,

Forner,

Navigli,

Tufis and

Ferro, eds, CEUR Workshop Proceedings, Vol. 1179, CEUR-WS.org, 2013, http://ceur-ws.org/Vol-1179/CLEF2013wn-QALD3-PradelEt2013.pdf .

119.

Presutti,

d’Amato,

Gandon,

d’Aquin,

Staab and

Tordai (eds), The Semantic Web: Trends and Challenges – 11th International Conference, ESWC 2014, Proceedings, Anissaras, Crete, Greece, May 25–29, 2014, Lecture Notes in Computer Science, Vol. 8465, Springer, 2014. doi:10.1007/978-3-319-07443-6.

120.

Rahoman and

Ichise, An automated template selection framework for keyword query over linked data, in: Semantic Technology, Second Joint International Conference, JIST 2012, Proceedings, Nara, Japan, December 2–4, 2012,

Takeda,

Qu,

Mizoguchi and

Kitamura, eds, Lecture Notes in Computer Science, Vol. 7774, Springer, 2012, pp. 175–190. doi:10.1007/978-3-642-37996-3_12.

121.

Rani,

M.K.

Muyeba and

Vyas, A hybrid approach using ontology similarity and fuzzy logic for semantic Question Answering, in: Advanced Computing, Networking and Informatics-Volume 1, Springer, Berlin, Heidelberg, Germany, 2014, pp. 601–609. doi:10.1007/978-3-319-07353-8_69.

122.

Ranta, Grammatical framework, Journal of Functional Programming 14(2) (2004), 145–189. doi:10.1017/S0956796803004738.

123.

K.U.

Schulz and

Mihov, Fast string correction with levenshtein automata, International Journal on Document Analysis and Recognition 5(1) (2002), 67–85. doi:10.1007/s10032-002-0082-8.

124.

Schwabe,

V.A.F.

Almeida,

Glaser,

R.A.

Baeza-Yates and

S.B.

Moon (eds), 22nd International World Wide Web Conference, WWW’13, Rio de Janeiro, Brazil, May 13–17, 2013, International World Wide Web Conferences Steering Committee/ACM, 2013. doi:10.1145/2488388.

125.

Shekarpour,

A.N.

Ngomo and

Auer, Question Answering on interlinked data, in: 22nd International World Wide Web Conference, WWW’13, Rio de Janeiro, Brazil, May 13–17, 2013,

Schwabe,

V.A.F.

Almeida,

Glaser,

R.A.

Baeza-Yates and

S.B.

Moon, eds, International World Wide Web Conferences Steering Committee/ACM, 2013, pp. 1145–1156. doi:10.1145/2488388.2488488.

126.

Shekarpour,

A.-C.

Ngonga Ngomo and

Auer, Query segmentation and resource disambiguation leveraging background knowledge, in: Proceedings of the Web of Linked Entities Workshop in Conjuction with the 11th International Semantic Web Conference (ISWC 2012),

Rizzo,

P.N.

Mendes,

Charton,

Hellmann and

Kalyanpur, eds, CEUR Workshop Proceedings, Vol. 906, CEUR-WS.org, 2012, pp. 82–93, http://ceur-ws.org/Vol-906/paper9.pdf .

127.

Shekarpour,

Auer,

A.N.

Ngomo,

Gerber,

Hellmann and

Stadler, Generating SPARQL queries using templates, Web Intelligence and Agent Systems 11(3) (2013), 283–295. doi:10.3233/WIA-130275.

128.

Shekarpour,

Höffner,

Lehmann and

Auer, Keyword query expansion on Linked Data using linguistic and semantic features, in: 2013 IEEE Seventh International Conference on Semantic Computing, Irvine, CA, USA, September 16–18, 2013, IEEE Computer Society, Los Alamitos, USA, 2013, pp. 191–197. doi:10.1109/ICSC.2013.41.

129.

Shekarpour,

Marx,

A.-C.

Ngonga Ngomo and

S.A.

Auer, SINA: Semantic interpretation of user queries for Question Answering on interlinked data, Journal of Web Semantics 30 (2015), 39–51. doi:10.1016/j.websem.2014.06.002.

130.

Shekarpour,

K.M.

Endris,

A.J.

Kumar,

Lukovnikov,

Singh,

Thakkar and

Lange, Question Answering on Linked Data: Challenges and future directions, in: Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Companion Volume, Montreal, Canada, April 11–15, 2016,

Bourdeau,

Hendler,

Nkambou,

Horrocks and

B.Y.

Zhao, eds, ACM, 2016, pp. 693–698. doi:10.1145/2872518.2890571.

131.

Shen,

Yan,

Ji,

Liu and

Chen, Sparse hidden-dynamics conditional random fields for user intent understanding, in: Proceedings of the 20th International Conference on World Wide Web, WWW 2011, Hyderabad, India, March 28–April 1, 2011

Srinivasan,

Ramamritham,

Kumar,

M.P.

Ravindra,

Bertino and

Kumar, eds, ACM, 2011, pp. 7–16. doi:10.1145/1963405.1963411.

132.

Shirai,

Inui,

Tanaka and

Tokunaga, An empirical study on statistical disambiguation of Japanese dependency structures using a lexically sensitive language model, in: Proceedings of Natural Language Pacific-Rim Symposium, 1997, pp. 215–220.

133.

Simperl,

Cimiano,

Polleres,

Ó.

Corcho and

Presutti (eds), The Semantic Web in: Research and Applications – 9th Extended Semantic Web Conference, ESWC 2012, Proceedings, Heraklion, Crete, Greece, May 27–31, 2012, Lecture Notes in Computer Science, Vol. 7295, Springer, 2012, pp. 27–31. doi:10.1007/978-3-642-30284-8.

134.

Sparck Jones, A statistical interpretation of term specificity and its application in retrieval, Journal of Documentation 28(1) (1972), 11–21.

135.

Srinivasan,

Ramamritham,

Kumar,

M.P.

Ravindra,

Bertino and

Kumar (eds), Proceedings of the 20th International Conference on World Wide Web, WWW 2011, Hyderabad, India, March 28–April 1, 2011, ACM, 2011. doi:10.1145/1963192.

136.

Sun,

Ma,

Yih,

Tsai,

Liu and

Chang, Open domain Question Answering via semantic enrichment, in: Proceedings of the 24th International Conference on World Wide Web, WWW 2015, Florence, Italy, May 18–22, 2015,

Gangemi,

Leonardi and

Panconesi, eds, ACM, 2015, pp. 1045–1055. doi:10.1145/2736277.2741651.

137.

Tao,

H.R.

Solbrig,

D.K.

Sharma,

Wei,

G.K.

Savova and

C.G.

Chute, Time-oriented Question Answering from clinical narratives using semantic-web techniques, in: The Semantic Web – ISWC 2010 – 9th International Semantic Web Conference, ISWC 2010, Revised Selected Papers, Part II, Shanghai, China, November 7–11, 2010,

P.F.

Patel-Schneider,

Pan,

Hitzler,

Mika,

Zhang,

J.Z.

Pan,

Horrocks and

Glimm, eds, Lecture Notes in Computer Science, Vol. 6497, Springer, 2010, pp. 241–256. doi:10.1007/978-3-642-17749-1_16.

138.

Tsatsaronis,

Schroeder,

Paliouras,

Almirantis,

Androutsopoulos,

É.

Gaussier,

Gallinari,

Artières,

M.R.

Alvers,

Zschunke and

A.N.

Ngomo, BioASQ: A challenge on large-scale biomedical semantic indexing and Question Answering, in: Information Retrieval and Knowledge Discovery in Biomedical Text, Papers from the 2012 AAAI Fall Symposium, Arlington, Virginia, USA, November 2–4, 2012, AAAI Technical Report, Vol. FS-12-05, 2012, http://www.aaai.org/ocs/index.php/FSS/FSS12/paper/view/5600 .

139.

Unger and

Cimiano, Representing and resolving ambiguities in ontology-based Question Answering, in: Proceedings of the TextInfer 2011 Workshop on Textual Entailment,

Padó and

Thater, eds, Association for Computational Linguistics, Stroudsburg, USA, 2011, pp. 40–49.

140.

Unger and

Cimiano, Pythia: Compositional meaning construction for ontology-based Question Answering on the Semantic Web, in: Natural Language Processing and Information Systems – 16th International Conference on Applications of Natural Language to Information Systems, NLDB 2011, Proceedings, Alicante, Spain, June 28–30, 2011,

Muñoz,

Montoyo and

Métais, eds, Lecture Notes in Computer Science, Vol. 6716, Springer, 2011, pp. 153–160. doi:10.1007/978-3-642-22327-3_15.

141.

Unger,

Cimiano,

Lopez and

Motta (eds), Proceedings of the 1st Workshop on Question Answering Over Linked Data (QALD-1), Co-Located with the 8th Extended Semantic Web Conference, Heraklion, Greece, May 30, 2011, 2011, http://qald.sebastianwalter.org/1/documents/qald-1-proceedings.pdf .

142.

Unger,

Bühmann,

Lehmann,

A.N.

Ngomo,

Gerber and

Cimiano, Template-based Question Answering over RDF data, in: Proceedings of the 21st World Wide Web Conference 2012, WWW 2012, Lyon, France, April 16–20, 2012,

Mille,

F.L.

Gandon,

Misselis,

Rabinovich and

Staab, eds, ACM, 2012, pp. 639–648. doi:10.1145/2187836.2187923.

143.

Unger,

Cimiano,

López,

Motta,

Buitelaar and

Cyganiak (eds), Proceedings of the workshop on interacting with Linked Data (ILD 2012), Workshop Co-Located with the 9th Extended Semantic Web Conference, Heraklion, Greece, May 28, 2012, CEUR Workshop Proceedings, Vol. 913, CEUR-WS.org, 2012, http://ceur-ws.org/Vol-913 .

144.

Usbeck,

A.N.

Ngomo,

Bühmann and

Unger, HAWK – Hybrid Question Answering using Linked Data, in: The Semantic Web, Latest Advances and New Domains – 12th European Semantic Web Conference, ESWC 2015, Proceedings, Portoroz, Slovenia, May 31–June 4, 2015,

Gandon,

Sabou,

Sack,

d’Amato,

Cudré-Mauroux and

Zimmermann, eds, Lecture Notes in Computer Science, Vol. 9088, Springer, 2015, pp. 353–368. doi:10.1007/978-3-319-18818-8_22.

145.

Vossen, A Multilingual Database with Lexical Semantic Networks, Springer, Berlin, Heidelberg, Germany, 1998. doi:10.1007/978-94-017-1491-4.

146.

Walter,

Unger,

Cimiano and

Bär, Evaluation of a layered approach to Question Answering over Linked Data, in: The Semantic Web – ISWC 2012 – 11th International Semantic Web Conference, Proceedings, Part II, Boston, MA, USA, November 11–15, 2012,

Cudré-Mauroux,

Heflin,

Sirin,

Tudorache,

Euzenat,

Hauswirth,

J.X.

Parreira,

Hendler,

Schreiber,

Bernstein and

Blomqvist, eds, Lecture Notes in Computer Science, Vol. 7650, Springer, 2012, pp. 362–374. doi:10.1007/978-3-642-35173-0_25.

147.

Welty,

J.W.

Murdock,

Kalyanpur and

Fan, A comparison of hard filters and soft evidence for answer typing in Watson, in: The Semantic Web – ISWC 2012 – 11th International Semantic Web Conference, Proceedings, Part II, Boston, MA, USA, November 11–15, 2012,

Cudré-Mauroux,

Heflin,

Sirin,

Tudorache,

Euzenat,

Hauswirth,

J.X.

Parreira,

Hendler,

Schreiber,

Bernstein and

Blomqvist, eds, Lecture Notes in Computer Science, Vol. 7650, Springer, 2012, pp. 243–256. doi:10.1007/978-3-642-35173-0_16.

148.

Wolfram, The Mathematica Book, 5th edn, Wolfram, Media, Champaign, USA, 2004.

149.

Xu,

Feng and

Zhao, Xser@QALD-4: Answering natural language questions via Phrasal Semantic Parsing, in: Working Notes for CLEF 2014 Conference, Sheffield, UK, September 15–18, 2014,

Cappellato,

Ferro,

Halvey and

Kraaij, eds, CEUR Workshop Proceedings, Vol. 1180, CEUR-WS.org, 2014, pp. 1260–1274, http://ceur-ws.org/Vol-1180/CLEF2014wn-QA-XuEt2014.pdf .

150.

Yahya,

Berberich,

Elbassuoni,

Ramanath,

Tresp and

Weikum, Deep answers for naturally asked questions on the Web of Data, in: Proceedings of the 21st World Wide Web Conference, WWW 2012 (Companion Volume), Lyon, France, April 16–20, 2012,

Mille,

F.L.

Gandon,

Misselis,

Rabinovich and

Staab, eds, ACM, 2012, pp. 445–449. doi:10.1145/2187980.2188070.

151.

Yahya,

Berberich,

Elbassuoni,

Ramanath,

Tresp and

Weikum, Natural language questions for the Web of Data, in: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012, Jeju Island, Korea, July 12–14, 2012,

Tsujii,

Henderson and

Pasca, eds, ACL, 2012, pp. 379–390, http://www.aclweb.org/anthology/D12-1035 .

152.

Yahya,

Berberich,

Elbassuoni and

Weikum, Robust Question Answering over the web of Linked Data, in: 22nd ACM International Conference on Information and Knowledge Management, CIKM’13, San Francisco, CA, USA, October 27–November 1, 2013,

He,

Iyengar,

Nejdl,

Pei and

Rastogi, eds, ACM, 2013, pp. 1107–1116. doi:10.1145/2505515.2505677.

153.

Yang,

Garduño,

Fang,

Maiberg,

McCormack and

Nyberg, Building optimal information systems automatically: Configuration space exploration for biomedical information systems, in: 22nd ACM International Conference on Information and Knowledge Management, CIKM’13, San Francisco, CA, USA, October 27–November 1, 2013,

He,

Iyengar,

Nejdl,

Pei and

Rastogi, eds, ACM, 2013, pp. 1421–1430. doi:10.1145/2505515.2505692.

154.

E.M.G.

Younis,

C.B.

Jones,

Tanasescu and

A.I.

Abdelmoty, Hybrid geo-spatial query methods on the Semantic Web with a spatially-enhanced index of DBpedia, in: Geographic Information Science – 7th International Conference, GIScience 2012, Proceedings, Columbus, OH, USA, September 18–21, 2012,

Xiao,

Kwan,

M.F.

Goodchild and

Shekhar, eds, Lecture Notes in Computer Science, Vol. 7478, Springer, 2012, pp. 340–353. doi:10.1007/978-3-642-33024-7_25.

155.

Zhang,

J.-C.

Creput,

Hongjian,

Meurie and

Ruichek, Query Answering using user feedback and context gathering for Web of Data, in: INFOCOMP 2013: The Third International Conference on Advanced Communications and Computation, Curran Press, Rockport, USA, 2013, pp. 33–38.

156.

Zou,

Huang,

Wang,

J.X.

Yu,

He and

Zhao, Natural language Question Answering over RDF: A graph data driven approach, in: International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22–27, 2014,

C.E.

Dyreson,

Li and

M.T.

Özsu, eds, Association for Computing Machinery, New York, USA, 2014, pp. 313–324. doi:10.1145/2588555.2610525.

dbo	http://dbpedia.org/ontology/
dbr	http://dbpedia.org/resource/
owl	http://www.w3.org/2002/07/owl#

Survey on challenges of Question Answering in the Semantic Web

Abstract

Keywords

1. Introduction

1 Definition based on Hirschman and Gaizauskas [80].

2 The time before is already covered in Cimiano and Minock [37].

3.1. Other surveys

3 Semantically weak constructions.

3.3. System frameworks

5 https://lucene.apache.org.

Table 3 URL prefixes used throughout this work dbo http://dbpedia.org/ontology/ dbr http://dbpedia.org/resource/ owl http://www.w3.org/2002/07/owl#

5.1. Lexical gap

16 In linguistics, the term lexical gap has a different meaning, referring to a word that has no equivalent in another language.

5.3. Multilingualism

5.4. Complex queries

5.5. Distributed knowledge

23 Such as “List the Semantic Web people and their affiliation.”, where the coreferent their refers to the entity people.

24 See http://www.w3.org/2003/01/geo/wgs84_pos at http://lodstats.aksw.org.

6. Conclusion

Footnotes

Acknowledgements

References

¹
Definition based on Hirschman and Gaizauskas [80].

²
The time before is already covered in Cimiano and Minock [37].

³
Semantically weak constructions.

⁵
https://lucene.apache.org.

Table 3
URL prefixes used throughout this work

dbo http://dbpedia.org/ontology/

dbr http://dbpedia.org/resource/

owl http://www.w3.org/2002/07/owl#

¹⁶
In linguistics, the term lexical gap has a different meaning, referring to a word that has no equivalent in another language.

²³
Such as “List the Semantic Web people and their affiliation.”, where the coreferent their refers to the entity people.

²⁴
See http://www.w3.org/2003/01/geo/wgs84_pos at http://lodstats.aksw.org.