Abstract
This paper describes the layers of context leveraged by language-endowed intelligent agents (LEIAs) during incremental natural language understanding (NLU). Context is defined as a combination of (a) the perceptual stimuli available to the agent at the given point in time, and (b) the knowledge elements and reasoning activated at the given stage of the agent’s interpretation of those stimuli. This approach to NLU addresses the treatment of a large number of difficult linguistic phenomena that are essential for high-quality NLU but are not being tackled by the knowledge-lean approaches that are typical of modern-day natural language processing. Although LEIAs are being developed as components of prototype application systems, this paper is not about implementations or evaluations – its contribution is conceptual, with everything described applicable to any artificial intelligent agent environment.
Introduction
Although the term context resists a crisp and fully satisfying definition – thus belonging to what Marvin Minsky has aptly called suitcase words (Minsky, 2006) – for the practical task of building language-endowed intelligent agents (LEIAs), we need a working definition. Ours is this:

Horizontal and vertical context, leveraged incrementally.
As an illustration of these notions of incrementality, consider the following examples, in which underlining separates text chunks that will be consumed in turn during semantic analysis.
A black bear ___ is eating ___ a fish.
My monkey ___ promised ___ he____ wouldn’t do____ that ____ anymore!
I ____ said, ___ “The mail ___ just came.”
If you heard or read sentence (1) without being able to look ahead, you would probably have a single interpretation at each stage of input, something like, A large mammal with black fur // A large mammal with black fur is ingesting // A large mammal with black fur is ingesting an aquatic animal. You would probably not consider the possibility of non-literal word senses, implicatures, sarcasm, humor and the like. There is no need to invoke extra reasoning since your basic knowledge of language and the world led to an interpretation that worked.
By contrast, (2) requires more effort, and more context, to interpret. Since the small simians we call monkeys cannot make promises, either monkey or promise must be non-literal. Monkey is a more obvious choice since people are often referred to by the name of an animal with a contextually relevant characteristic. Our sentence could, for example, be said about a child who is swinging dangerously on a jungle gym. In addition, although a basic interpretation can be gleaned from the sentence without contextual grounding, its full interpretation requires determining the referents for my, he, and that.
Our last example, when taken out of context, has residual ambiguity not only with respect to the identity of I, but with respect to the force of I said. It can be used: (1) when the interlocutor fails to hear the original utterance, as in a noisy room or over a bad phone connection; (2) as part of a story: “I said, ‘The mail just came.’ And he suddenly leaps out of his chair and barrels out the door!”; and (3) to emphasize an indirect speech act that was not acted upon originally: “The mail just came. [No reaction] I said, the mail just came.” (The implication is that the interlocutor is supposed to go fetch it.) All of these interpretations are available based on general lexical, semantic, and pragmatic knowledge, but choosing among them requires more features from the speech context.
The point of these examples is that it would make no more sense to have LEIAs invoke deep reasoning to analyze (1) than it would to expect them to understand the pragmatic force of I said in (3) without access to contextual features. Accordingly, a key aspect of the intelligence of intelligent agents is their ability to independently determine which resources to leverage, when, and why, as well as what constitutes a sufficient analysis.
This paper provides a high-level overview of the incremental leveraging of horizontal and vertical context by LEIAs during NLU. Since it would be impossible to fundamentally describe, in this short space, the entire agent architecture along with its large NLU module, our goal here is modest: to provide what we believe is a novel take on the notion of context that has both psychological plausibility and applicability to agent systems.
Our computational cognitive modeling of LEIAs is a mature program of work that covers perception, reasoning and action – the typical pillars of cognitive architectures (Langley, Laird and Rogers, 2009). As Fig. 2 shows, no matter which kinds of perceptual stimuli a LEIA receives, it must interpret them using its knowledge resources, which involves translating them from data streams into interpreted facts represented in its ontologically-grounded knowledge bases. Types of stimuli include language input, bodily signals generated through physiological simulations, visual stimuli, and, in principle, the output of any kinds of sensors that could be incorporated into a physical or virtual agent system.1
Language understanding and interoception were incorporated into LEIAs functioning as virtual patients in the Maryland Virtual Patient application (Nirenburg, McShane and Beale, 2008). Language understanding and robotic vision are currently being integrated into a robotic assistant (Nirenburg et al., 2018).

High-level LEIA architecture.
Since data are interpreted into ontologically-grounded facts, it does not matter whether a robotic LEIA knows that it must hit a given nail because its human teammate said, “Hit the nail” or because the latter said, “Hit this” while pointing at the nail – either way, the meaning representation will be the same, and the LEIA can use this new knowledge for subsequent reasoning about action.
LEIAs overall – and their NLU capabilities in particular – are modeled according to principles of human-inspired cognitive modeling (McShane and Nirenburg, 2012). The modeling tenets particularly relevant to this discussion are:
NLU follows the theory of Ontological Semantics as originally stated in Nirenburg and Raskin (2004) and augmented in subsequent writings (McShane, Beale and Babkin, 2014; McShane, Nirenburg and Beale, 2015; McShane and Babkin, 2016a, 2016b). Under this approach, language understanding consists of translating input language strings into unambiguous, context-sensitive, ontologically-grounded text meaning representations (TMRs) that are well-suited to automatic reasoning.
LEIAs analyze inputs using horizontal and vertical incrementality.2
Of course, if an application is not time-sensitive, processing subsentential fragments can be skipped.
LEIAs operating in dialog contexts are not expected to arrive at a full and perfect interpretation of every input since even people do not do that. In real-life language use, many utterances do not make sense (e.g., poorly formulated thoughts in a brainstorming session), are irrelevant to the listener (e.g., a boring story at a cocktail party), or fall outside of the listener’s scope of knowledge (e.g., a technical discussion about nuclear physics for most readers of this text). Instead of striving for a full interpretation, LEIAs focus on achieving an actionable interpretation of each utterance, that is, an interpretation that is of sufficient quality and confidence to allow the LEIA to reason about whether some action must be carried out as a result of the interpretation process. Actions can be not only physical and verbal, but also mental, such as storing the interpretation of an input in the agent’s memory, or deciding to stop interpreting an input at a certain point when it becomes clear that its meaning is outside of the agent’s scope of interest.
Some inputs have both more generic and more specific readings, which reflects what some refer to as the semantics vs. pragmatics distinction. LEIAs first generate the former and then, if they deem further specification worth the effort, they pursue the latter. This underscores that not only is NLU itself a reasoning-heavy enterprise, agents must also employ metalevel reasoning about how to carry out the analysis process within the scope of their overall (not just language-oriented) functioning.
A LEIA’s lexicon includes more kinds of knowledge to cover more linguistic phenomena (ellipsis, indirect speech acts, non-literal language, etc.) than do most computational lexicons. This is not only because we believe that people actually have and use such knowledge in processing language, but also because anchoring computational treatments of linguistic phenomena declaratively in the lexicon represents good practice in knowledge engineering as it helps to keep strict track of how knowledge elements interact with implemented algorithms.
LEIA modeling involves theory-based feature engineering, i.e., coming up with an inventory of properties (parameters) and their value sets to reflect aspects of context. The layers of context we will discuss involve a variety of types of feature values accessed from a variety of sources or computed in a variety of ways. Note that the inventory of parameters is independently motivated by theoretical considerations. It has not been adapted specifically for any particular reasoning method. The reasoning methods that use the parameter inventory could be implemented variously using heuristic rules, logics, probabilistic approaches, analogy-based statistical methods, etc.
Since LEIAs are multifunctional intelligent agents, the features used to model them cover many aspects of cognition including, non-exhaustively, the agent’s knowledge/beliefs about the world (ontology); its knowledge of remembered object, event and state instances (long-term episodic memory); the active object, event, and state instances comprising its situation model (working memory); its knowledge about language (including the lexicon and rule sets); its inventories of goals and plans; its personality traits; its cognitive biases; its physical, mental and emotional states; and its beliefs about all of these with respect to its interlocutors, which the agent must hypothesize using mental model ascription, also called “mindreading” (McShane, 2014).
The conceptual aspects of the theory presented here are architecture-neutral. However, we ground the discussion in a particular architecture in order to concretize the descriptions and emphasize that the theory is not only implementable in principle, it is actually being validated through computer implementations.
Before turning to a description of each of the vertical layers of context, which is the core of our story, let us reiterate some main points. Vertical context involves leveraging, as required, more sources of knowledge and types of reasoning to interpret whatever elements of input are already available, whereas horizontal context involves consuming more elements of input over time. We have found it conceptually useful to organize the vertical layers of context into the seven processing stages shown in Table 1. The table is not intended to be fully self-explanatory at this early stage of the narrative; we provide it as a road map.
A snapshot of the layers of vertical context
A snapshot of the layers of vertical context
The control flow for the agent’s NLU system, which involves decision-making about how to proceed through the horizontal and vertical layers of context, is represented in Fig. 3. Note that decisions about actionability rely upon the particular plans and goals of a particular agent at a particular time – a topic that we will mention only in passing.

Decision-making during semantic analysis.
In what follows, we describe each layer of vertical context in turn, identifying the problems addressed and the rationale for addressing them at the given time. As mentioned earlier, engineering details and the state of implementation of each functionality are beyond the scope of this paper, though we briefly address them in the closing section.
Syntactic parsers provide part-of-speech tagging, morphological analysis, a constituent parse, a dependency parse, and named-entity recognition. Although dependency parses would, in principle, provide the most information for downstream semantic analysis, at the current state of the art they are error-prone, particularly in less formal speech registers like task-oriented dialogs. Moreover, no parsers can reliably treat phenomena that are inherently semantic, such as prepositional phrase attachments. For these reasons, although LEIAs do use externally developed tools for preprocessing and parsing (currently, the Stanford CoreNLP toolkit by Manning et al. (2014)3
Although the CoreNLP toolkit was not designed for incremental parsing, it does produce outputs even for subsentential fragments, and we find those outputs sufficient for our purposes.
During horizontally incremental NLU, each new word of input is subject to preprocessing and syntactic analysis. Depending on the syntactic features of the parse, the LEIA will decide whether to proceed horizontally (i.e., consume another element of input) or vertically (analyze the given input using additional resources). The question is, “Is it worth it to launch semantic analysis on this fragment?” There are many ways one could formulate this decision function based on the urgency of the application and the degree to which one wants LEIAs to emulate people, but a reasonable starting point is this: Do not launch semantic analysis (i.e., instead, immediately consume the next element of input) if the most recent word is not a noun, a verb, or the end-of-input marker. This reflects the inutility of, e.g., semantically analyzing sentence-initial The or The very.
The syntactic parses generated by CoreNLP must be massaged by the LEIA in preparation for downstream semantic analysis. The subsections below describe four main component procedures developed for this purpose.
Syntactic mapping
The LEIA uses select results of preprocessing and syntactic analysis, along with a simple grammar, to match constituents of input with the syn-struc (“syntactic structure”) zones of entries in the LEIA’s lexicon. This process, which we call syntactic mapping, answers the question, “Syntactically speaking, what is the best combination of word senses to cover this input?” Fig. 4 illustrates the mapping process for the input He ate a sandwich. It shows the relevant parts of two lexical senses of eat; one (eat-v1) is syntactically suitable and the other (eat-v2) is not.4
We use the term “lexical sense” to refer to word senses that are recorded in the LEIA’s lexicon.

A visual representation of syntactic mapping. For the input He ate a sandwich, eat-v1 is a good match because all syntactic expectations are satisfied by elements of input, whereas eat-v2 is not a good match because the required prepositions away and at are not attested in the input.
Later on, during Stage 3 (Basic Semantic Analysis), the semantic analyzer will determine whether the meanings of the variables filling the subject and direct object slots (“ˆ” in the sem-struc zone of the lexicon entry indicates ‘the meaning of’) of eat-v1 are appropriate fillers of the
Syntactic mapping can work – i.e., not result in an incongruity – even for sentence fragments as long as they are valid beginnings of what might result in a canonical structure. This is, obviously, an important aspect of modeling incremental language understanding. For example, the inputs The rust is eating and The rust is eating away are both unfinished, but they will map perfectly to the expectations of eat-v2 in Fig. 4. Syntactic mapping does not work, however, in cases of non-canonical syntax (I, I, ugh
Before considering each of these eventualities, let us linger for a moment on the idea of an input working and not working with respect to syntactic mapping. If an input fails to result in a valid syntactic map, then the LEIA abandons horizontal incrementality (along with its associated opportunities, such as acting immediately upon a mid-stream utterance) and waits until the end of the sentence to try to make sense of the entire input. This is not only a computationally expedient solution, since the recovery programs need as much information as possible, it also makes sense in terms of cognitive modeling, as it is unlikely that the high cognitive load of trying to reconstruct meaning out of non-normative, subsentential input will be worthwhile. So, the two recovery methods we now describe will have access to the full utterance.
As mentioned earlier, there are some syntactic decisions that cannot be made without semantic (often discourse-determined) heuristics. These include:
Prepositional phrase (PP) attachments:
Nominal compounds containing 3 or more nouns:
Ambiguities between prepositions and particles:
The LEIA detects these situations in CoreNLP output and reambiguates the parse, making all candidate analyses available for downstream semantic processing.
Recovering from non-canonical syntax
Among the most typical sources of non-canonical syntax are repetitions, disfluencies, and false starts: e.g., “I, um, I wish
Positing lexical senses for unknown words
The learning of new word senses on the fly is supported by an inventory of generic, syntactically specific but semantically underspecified, word senses. Consider the word sense (in a simplified, presentation format) that supports the learning of new transitive verbs:
The
Contents of the sem-struc zone of a lexicon entry typically reference ontological concepts, which are typeset, by convention, in small caps.
The module just described responds to the fact that the output of a parser does not provide all that is needed from the syntactic component of an NLU system.
Syntactic mapping typically results in multiple syntactic maps since many words are polysemous and their multiple meanings can have the same syntactic properties (e.g., the ‘chart’ and ‘furniture’ senses of table are both simple nouns, and the literal and figurative meanings of attack are both transitive verbs). Each syntactic map is scored according to its syntactic suitability. Two of the many scoring heuristics are as follows: If a complete utterance (not a midstream fragment) fails to use a necessary constituent of a lexical sense (cf. the mapping to eat-v2 in Fig. 4), it receives a large penalty; and if an input includes the required lexemes of a phrasal or idiomatic word sense, which are listed as literals in the syn-struc (as when the bucket appears as the direct object of kick), then it receives a bonus. It is a design choice whether to pass all candidate syntactic mappings on to semantic analysis or to exclude those that do not meet a scoring threshold.
The main decision point after syntactic mapping involves non-sentence-final inputs, for which the LEIA must determine whether to proceed to semantic analysis (move vertically through ever deeper layers of context) or immediately consume the next word of input (expand the horizontal context). If, for example, the current input fragment ends with a highly polysemous and necessarily transitive verb (e.g., He makes), then semantically analyzing it will result in a large number of candidate analyses. Asking for the next word of input in such cases is the most efficient strategy to limit the search.6
A reviewer noted that people make context-informed inferences and predictions, which is certainly true. We model this by giving the LEIA the option to analyze any fragment as deeply as it chooses to. If it takes the analysis of the fragment He makes all the way to Stage 6, it will use everything available from the preceding language and situational context to disambiguate. However, it is important that the LEIA’s decision-making about anticipatory analysis be realistic and useful: after all, given a normal rate of speech, it is unlikely that interlocutors spend a lot of time and effort guessing what will come next given highly ambiguous midstream fragments.
Basic semantic analysis involves lexical disambiguation and the establishment of the semantic dependency structure. We call it “basic” because it does not yet invoke coreference resolution, static knowledge sources beyond the lexicon and ontology, or situational reasoning. Below is a simplified, pretty-printed basic text meaning representation (TMR) for the most straightforward of examples,
TMRs also include inverses (e.g.,
Generating
Coreference is blocked due to the presence of the article ‘a’: the lexical senses for the words ‘a’ and ‘an’ include this call to the procedural semantic routine that blocks the search for a coreferent.
This is a call to a procedural semantic routine that, if run, will attempt to determine the actual time of speech.
Since the LEIA’s default strategy is to launch semantic analysis only when a nominal head, a verbal head, or an end-of-sentence indicator is reached, the first fragment it will analyze is a gray squirrel. There is only one adjectival sense of gray and one nominal sense of squirrel, and they are semantically compatible, so the phrase is analyzed as the frame
This excerpt says that the typical
This is a simplification, We realize that lions can devour a human and that cannibalism can, regrettably, occur. The decision to disregard these considerations in this case illustrates the constant need to balance the completeness of ontological descriptions with the goal of facilitating semantic analysis.
Operationally speaking, the TMR for our sentence is generated by: a) copying the sem-struc of eat-v1 (cf. Fig. 4) into the nascent text meaning representation; b) using a numbered instance (
In terms of the requirements for runtime reasoning, this example is as simple as it gets since it involves only matching recorded constraints, and in the example above all constraints match in a unique and satisfactory way. However, “simple constraint matching” does not come for free: its precondition is the availability of high-quality lexical and ontological knowledge bases that are sufficient to allow the LEIA to judge the semantic congruity of candidate interpretations. These knowledge bases, along with the reasoning the LEIA needs to appropriately use them, are the knowledge context for basic semantic analysis. Candidate TMRs are scored based on how well the syntactic and semantic expectations of lexical senses are satisfied by the candidate interpretation.
Since lexical disambiguation is a core functionality in NLU by LEIAs, let us consider one more example of how semantic constraints available in the lexicon and ontology enable the LEIA to disambiguate word senses. Table 2 shows the first two verbal senses for address. Syntactically, both expect a subject and a direct object in the active diathesis – filled by the variables $var1 and $var2, respectively. However, in address-v1, the meaning of the direct object (ˆ$var2) is constrained to a
Two verbal senses for the word address. The symbol ˆ indicates “the meaning of”
The semantic constraints on the case-roles in these examples are shown in parentheses because they are actually not written in the lexicon, they are accessed from the ontology at runtime (see work by McShane, Nirenburg and Beale (2016) for details).
Now that the basic process of semantic analysis should be clear, let us consider in some more detail how horizontal incrementality (adding elements of input over time) fits into the picture, as well as the scope of linguistic phenomena that can be treated as part of basic semantic analysis thanks to the knowledge context provided by our feature-rich lexicon.
As mentioned earlier, although LEIAs carry out preprocessing and syntactic analysis (using CoreNLP) for each new word of input, they do not even consider undertaking semantic analysis unless the most recent string is a noun, a verb, or the end-of-input indicator. Consider the incremental semantic analysis of the input Audrey killed the motor, presented with minimal detail so as not to occlude the main point. The first word of input is Audrey. The onomasticon (lexicon of named entities) contains only one sense of this string, so the nascent TMR is:
The next word is killed, so the combination Audrey killed is analyzed. The lexicon has five senses of kill but only three of them permit a
The next word of input is the, which does not trigger a new round of semantic analysis. The next and final stage of analysis considers the entire sentence Audrey killed the motor. Each of our three still-viable senses of kill includes specific semantic constraints on the direct object: for sense 1 it must be an
In
This filler for the
Basic semantic analysis covers any form-to-meaning mapping that can be recorded using our basic method of lexical specification (i.e., linking elements of syn-strucs and sem-strucs in lexicon entries) and, optionally, procedural semantic routines that rely exclusively on lexical and ontological knowledge (procedural semantic routines that require other types of knowledge are run during Extended Semantic Analysis). For expository purposes, we divide the linguistic phenomena addressed during semantic analysis into four classes, discussed in turn.
The entity has one or more static meanings
Most words, phrases, and constructions in the lexicon have one or more static meaning representations – i.e., their meanings are recorded in the sem-struc zone using one or more ontological concepts. The types of meanings that are recorded in this way cover objects and events; their properties; quantifiers; values of modality and aspect; and even full propositions (e.g., for sentential constructions and idioms).13
Within this discussion of context, one noteworthy detail is that we record canonical indirect speech-act patterns as lexical senses: e.g., I need to know ⟨I’d love to know, It would be great if I knew, You need to tell me⟩ X are all ways of asking the interlocutor to report X (ontologically speaking, this is aThe direct meaning in this example includes an instance of obligative modality scoping over the proposition; the indirect meaning generates an instance of
In the same way, the LEIA generates multiple meanings for idiomatic expressions: e.g., He kicked the bucket has both a direct meaning (he forcefully contacted the bucket with his foot) and an idiomatic one (he died). The idiomatic meaning is preferred by default unless contextual clues – such a coreferential sponsor for bucket are found.
Many otherwise difficult-to-process linguistic inputs are computed during basic semantic analysis thanks to our phrase- and construction-rich lexicon. Although lexical specification of this type is labor-intensive, once the associated senses are written, all matching inputs can be processed using the basic semantic analyzer – i.e., we do not need specially written code to cover each of the hundreds of difficult linguistic problems that are relevant for a particular subset of lexemes. The phenomena we will use here as examples of “early analysis thanks to dedicated lexical senses” are scalar modifiers, gapping, and VP ellipsis configurations.
Formalism aside, the sense of and that covers our example above is [
Verb phrase ellipsis (described further in Section 5.2.3) is a theme with many variations. One relatively easy subclass of VP ellipsis involves what we call
The interpretation of the entity must be honed over several stages of processing
The star example of honing an interpretation over several stages of processing is
Another phenomenon whose interpretation typically must be honed over multiple stages is
Our final example of multi-stage analysis involves sentential
These stand in contrast to mid-sentence fragments, which are processed as a matter of course during horizontal-incremental processing.
The fact that not all instances of a linguistic phenomenon are created equal is well known to developers of knowledge-based systems. (This fact is, however, often occluded in mainstream NLP by the practice of creating task definitions that explicitly exclude all difficult instances, meaning that developers of end systems never encounter them.) It is useful, therefore, to split broad linguistic classifications of phenomena into finer functional classifications that are handled differently – and, in our environment, at different stages. The examples of such variegated linguistic phenomena that we will consider here are nominal compounds, metaphors, and metonymies.
Many
Values of scalar attributes can be actual quantities, with measuring units specified, or points on the abstract
Like metaphors, many word senses that can historically be analyzed as
If a psycholinguistic experiment were to validate this claim, it would need to show that interpreting inputs containing metonymies was slower than interpreting inputs containing exclusively direct formulations.
We have already described the theoretical justification for considering so many phenomena to be part of basic semantic analysis, but there is also an important benefit in terms of system development. Large knowledge-based systems become unwieldy if they fail to maximally exploit generalized functions or fail to keep knowledge elements tightly organized. We cannot afford to have countless phenomenon-specific functions scattered throughout the analyzer code to deal with each difficult linguistic phenomenon. The way we have organized it, the procedures to deal with these phenomena are anchored to the lexical senses that give rise to them, and all traces of calls to procedural semantic routines are recorded explicitly in the metadata of TMRs. This supports testing, debugging, and iterative refinement of knowledge bases and algorithms.
In some cases, basic semantic analysis is sufficient as an end stage of NLU, as for inputs that have no coreference needs and for which the LEIA arrives at a single, high-confidence interpretation (e.g., our A gray squirrel is eating a nut). In other cases, basic semantic analysis is sufficient to convince the agent that it does not need to analyze the input any more deeply – e.g., all TMR candidates might lack any concepts within the LEIA’s domain of interest for a particular application. However, in most cases, the basic TMR (or the multiple candidates) serves as input to the more sophisticated reasoning needed to arrive at a full and confident, contextually-grounded interpretation.
The next two stages of analysis – Reference: Initial Analysis, and Extended Semantic Analysis – further specify and/or disambiguate the basic TMR. They are similar in that they both use additional static knowledge bases (ontology, lexicon, and associated rule sets) and reasoners, stopping short of requiring situational reasoning that is specific to a particular agent operating in a particular domain. As such, these two stages of analysis can be useful not only to agents across domains and applications, but also for non-agent-oriented applications.
Stage 4: Reference: Initial analysis
At this stage, all referring expressions – whether overt or elided – have already been detected and provided with a basic semantic analysis. What remains is to ground them in the discourse context. In some cases, this requires textual coreference; in others, it requires coreference with something in the real-world environment that is not mentioned in the linguistic context; and in still others, it requires recognizing that the referring expression is new to the discourse. No matter which of these eventualities obtains, the LEIA must ultimately (in Stage 6 of processing) anchor each referring expression to an entity in its memory, which is what we hold as the true definition of “reference resolution”. We divide our very brief description of the initial stage of reference resolution into two components: coreference resolution and reconsidering lexical disambiguation decisions using coreference heuristics.
Coreference resolution
For coreference resolution, LEIAs use a combination of off-the-shelf and internally developed engines that rely both on statistical and knowledge-based methods.19
Currently, we are using the coreference engine in CoreNLP (Lee et al., 2013) as well as programs developed in-house (described by McShane (2015), and by McShane & Babkin (2016a, 2016b)). However, as mentioned earlier, the overall theory and methodology we describe here is not dependent upon particular development decisions like these. Whatever engines can provide the highest-confidence and/or broadest coverage coreference decisions can and should be used.
VP ellipsis will have been detected during basic semantic analysis due to the use of a modal or aspectual lacking a verb/
What is the verbal/
Are the sponsor and elided event in a type-coreference or instance-coreference relationship? Type-coreference (there are 2 different instances of the
Do the internal arguments have strict or sloppy coreference (i.e., the same or different real-world referents)? Sloppy coreference (there are two different instances of projects).
Are the meanings of modifiers in the sponsor clause copied or not copied into the resolution? Before class is copied but yesterday is not.
Are modal and/or aspectual meanings in the sponsor clause copied or not copied into the resolution? The modal meaning indicating epiteuctic modality (i.e., managed to) is copied.
Identifying the sponsor head can be particularly difficult. LEIAs have a battery of methods for doing it but, if this search does not result in a single, high-confidence answer, then the LEIA postpones the ellipsis resolution (i.e., leaves the underspecified
Examples (4) and (5) are from the Gigaword corpus (Graff and Cieri, 2003).
Better-off parents could
The former Massachusetts governor
I can’t
LEIAs can confidently detect the ellipsis sponsor in (4) using primarily syntactic methods that rely on (a) the syntactic parallelism between the two clauses, (b) the lack of any other events in the local context (e.g., there are no relative or subordinate clauses attached to the first conjunct) and (c) the semantic pairing of the modal verbs, could/could not. Although we present these conditions informally here, they are all detailed – along with a formal evaluation of a system implementation – in McShane and Babkin (2016a, 2016b). Since this method is based on syntax, the LEIA needs no special domain knowledge to use it and be confident in its result.
Example (5), by contrast, offers four verbs that could be the sponsor and there is no simple way to choose; one has to understand what is going on, which is not only difficult, it is outside of the domains for which our LEIAs currently have specialized ontological knowledge.
By contrast, sentence (6) is within the purview of our LEIAs since we are preparing them to collaborate with people on physical tasks like building objects. As part of the supporting knowledge, they know what they can and cannot do. Since our LEIAs cannot get stuck or get hurt, but they can tighten things, they can confidently guess that the resolution for “you will have to” is “tighten the screw”.
In considering these three examples as a group, we have skipped forward in our narrative, arriving at situational reasoning (Stage 6) ahead of time. But this emphasizes that LEIAs do not treat linguistic phenomena, such as VP ellipsis, as a monolith. Instead, every linguistic phenomenon is further divided into functional classes of instances that can be treated using particular types of knowledge and reasoning, and resulting in varying degrees of confidence. So, the question, “Can your agent (or non-agent system) do X?” does not make sense (and may cause a misunderstanding) if X is a high-level phenomenon like “resolve pronominal coreference” or “resolve VP ellipsis”. The question is always “Which instances can the agent treat in which contexts and with what confidence?”
Lexical disambiguation and (co)reference resolution inform each other, meaning that a pipeline architecture is not an optimum solution. Instead, we model this interaction as follows. First, during basic semantic analysis, the LEIA generates all candidate TMRs (i.e., considers all senses of all words) and derives a score of their semantic suitability. Next, it carries out coreference resolution using largely syntactically-based methods (both statistically trained and rule-based) and scores those. Finally, it considers the lexical disambiguation and coreference resolution decisions together, using a third scoring function. To put a finer point on it, we do not include semantic heuristics in the initial process of coreference resolution; instead, “surfacy coreference” decisions are made initially and then subsequently combined with semantic preferences.
Consider, for example, the sentence John’s father talked at length with the surgeon after which he started the operation. Although the syntactically-oriented coreference procedures will prefer to corefer he with John’s father, the semantic preference for he referring to the surgeon will be judged more definitive based on the preference scoring mechanism. Specifically, the choice space for LEIAs in analyzing this sentence includes not only two coreference possibilities for he but also two readings of start the operation, referring to beginning
To sum up, the Reference: Initial Analysis stage goes far beyond the textual coreference task that has been extensively pursued in statistical NLP and will be well-known to many readers. Reference resolution invokes many types of heuristic evidence derived from knowledge bases, rule sets, and the results of upstream processing, and it is tightly connected with semantic analysis. All of the analysis methods brought to bear at this stage, however, are still independent of situational reasoning. If these do not offer a full, high-confidence21
resolution, the LEIA can revisit the outstanding reference questions at stage 6.This stage, like the previous one, addresses lacunae identified during basic semantic analysis using the ontology, lexicon, and related rule sets – not yet situational reasoning. We divide relevant phenomena into four categories: residual ambiguities, incongruities, underspecifications, and fragments. Two things to keep in mind when reading about the corresponding analysis methods are, first, that they address truly difficult problems, solutions for which will need to be enhanced over time; and, second, that the agent’s inability to achieve a precise analysis in some cases is not a requirement for its effective functioning overall. Consider a real-life example: You cross paths with a colleague walking across campus, have a quick chat, and he wraps it up by saying, “Sorry, I’ve got to run to a dean thing”. Is the thing in question about a dean, organized by a dean, required by a dean, or for deans only? Do you care? Chances are, you do not. The salient point to understand is that the speaker had a reason to cut the conversation short. This example shows that underspecification is a useful feature of language, not a bug. Exactly which decisions a LEIA makes about saliency depend upon the needs of a given application or situation.
Treating residual ambiguities
Many instances of polysemy cannot be resolved using the local dependency structure, which was the main source of information during basic semantic analysis. Consider the input, The police arrived at the port before dawn. They arrested the pirates with no shots fired. What comes to mind as the meaning of pirates? Most likely, seafaring bandits, not intellectual property thieves. However, replace at the port by in a secret computer lab and the other interpretation will win. Of course, both kinds of pirates can be arrested at either place – the selectional constraints of
We have operationalized five methods of using the context beyond the local selectional constraints. They involve recognizing certain types of correlations between
So,
Using the basic slot-filler formalism of the ontology, one could record who eats what by creating an ontological concept for each animal-food pair: e.g.,
Consider how such shortcut object correlations can help in disambiguation. Given the input The cow was eating grass, the knowledge element
Finding these fillers, the LEIA then concludes that
To sum up this section, we have just seen five ontology search strategies that an agent can use when dependency-based disambiguation (Stage 2 of context) leaves residual lexical ambiguity. All of these attempt to determine if
Incongruities arise when there is no semantically valid correlation of dependent constituents and their heads. The examples of incongruities we discuss here are non-lexicalized conventional metonymies and indirect modifications.
As described earlier,
Typical metonymic relationships are recorded in the metonymy repository, which is formulated in terms of ontological concepts. It includes such correspondences as producer for product (We bought an Audi), social group for its representative(s) (The ASPCA reported
Although one could create additional lexical senses for every adjective subject to such shifts, this is both unnecessary and counter to our aims of human-oriented cognitive modeling: after all, it seems like we do have to exert extra reasoning to reinsert the elided individuals (something that might be testable through psycholinguistic experimentation). Instead, we can record the knowledge that this modification shift can happen using a general recovery rule like the following, in which the asterisk indicates that the
This says that there is an instance of a
Although this TMR reliably resolves the initially detected incongruity, it leaves a certain aspect of meaning – who is chasing whom – underspecified. This may or may not be resolvable given a particular context. Compare: Lions regularly engage in bloodthirsty chases with The lion and zebra were engaged in a bloodthirsty chase. Strictly language-oriented reasoning can correctly analyze the first but not the second. The reason the first works nicely is because of a lexical sense for the phrasal ‘X engages in Y’, which maps the meaning of X to the
Treating underspecifications
Among the types of underspecifications that can be more deeply analyzed at this stage are NNs and fragments. For NNs that were initially analyzed using a generic
If N1 is
If N1 is
If N1 is
If N2 is
Note that these patterns not only suggest which relation to choose, they also help to disambiguate the nouns in question. For example, both height and ceiling also have metaphorical uses (He grew up at the height of the Great Depression; This methodology has reached a ceiling of results) which are not applicable to the compound ceiling height. There are three reasons why this reasoning is postponed until extended semantic analysis. First, the input to the rules are concepts, not words, and those concepts must be computed during basic semantic analysis; second, the rules are written in terms of concepts, meaning that they are not part of the lexicon/ontology pair that supports basic semantic analysis. Third, as illustrated by the dean thing example above, deeper analysis of the NN might not even be needed by the LEIA. If such analysis is needed but is not covered by the inventory of recorded ontological patterns just described, the LEIA will be able to use a shortest-path ontological search strategy (similar to the one described earlier for lexical disambiguation), though it is known to result in lower-confidence analyses.
Integrating fragments into the discourse context
Since LEIAs understand inputs incrementally, they are regularly processing fragments before the end of the input is reached. Those mid-sentence fragments are not what we are talking about here. Here, we are talking about fragments that remain non-propositional at the end-of-sentence indicator.
A LEIA’s basic approach to analyzing such fragments involves two stages of analysis:
During Basic Semantic Analysis (Stage 2), the LEIA:
generates semantic interpretations that are possible to derive on the basis of the fragment itself
reviews the output TMR and, if it is non-propositional, flags this in the metadata of the TMR
During Extended Semantic Analysis (this stage, Stage 5), the LEIA:
detects what is missing in the non-propositional TMR
attempts to recover the missing information using linguistic and basic ontological knowledge, and
once such information is recovered, verifies that the original semantic interpretation is valid; otherwise, amends it.
This process is best illustrated using an example. Consider the dialog in (7):
“My knee was operated on.”
“When?”
“In 2014.”
The meaning representation (abbreviated to avoid complexity) for the first utterance,
The next utterance is the fragment “When?” For each question word, the lexicon contains a sense that expects the question word to be used as a fragmentary utterance. This reflects expectation-driven knowledge engineering – i.e., preparing the system for what it actually will encounter, not only what grammar books say is the most canonical. For question words like When? the semantic representation (i.e., sem-struc zone of the lexical sense) explicitly states that this is an elliptical structure by positing an underspecified
The two structures in Table 3 show the initial, sentence-level meaning representation for When? followed by the representation after the coreference has been resolved.
The interpretation of “When?” before and after coreference has been resolved
The interpretation of “When?” before and after coreference has been resolved
Our final fragment in this example is In 2014. This is more challenging because the preposition in is highly polysemous. One heuristic used by LEIAs when resolving polysemy is to select the interpretation that matches the narrowest selectional constraints. In this case, it selects in-prep10 (the 10th prepositional sense of in) because that sense asserts that the object of the preposition must refer to a
The interpretation of “In 2014” before and after coreference has been resolved
To sum up, this example showed how ellipsis (When?) and non-propositional utterances (In 2014) are analyzed over two stages: first, their basic semantics is computed, and later their meaning is incorporated into the discourse context.
Let us recap the last three stages. Basic Semantic Analysis (Stage 3) carries out lexical disambiguation and creates the local semantic dependency structure. It covers a large inventory of advanced linguistic phenomena thanks to fine-grained, anticipatory lexical acquisition, but it is expected to regularly result in residual ambiguities (multiple high-scoring TMRs), incongruities (no high-scoring TMRs), and/or underspecifications (as in cases of ellipsis). The following two contextual analysis stages – Reference: Initial Analysis (Stage 4) and Extended Semantic Analysis (Stage 5) – attempt to further specify and contextually ground specific aspects of the TMR using knowledge and reasoning that are broadly applicable across agents and applications. In essence, processing through Stage 5 represents what an agent can bring to bear in terms of general linguistic and world knowledge. In this sense, the end of Stage 5 might be considered a conceptual full stop in the process of NLU.
This is important for the following reason: although agents are able to engage in (horizontally) incremental NLU, and although they are able to make decisions about what to do after each (vertical) stage of analyzing each fragment, this degree of splitting the NLU process is not necessary for all applications. An alternative deployment strategy for non-urgent interactions is for agents to wait until the end of each utterance to process it, and then to process it through Stage 5 – which results in a full analysis for many inputs – before pausing to decide what to do next.
Stage 6: Situational reasoning
Up until this point, all of the LEIA’s language analysis methods have been generic, not relying on specialized domain knowledge or on the agent’s understanding of what role it is playing in a particular real-world activity at the time of speech. However, considering that LEIAs are modeled after people, one might ask, “Don’t people – and, therefore, shouldn’t LEIAs – always know what context they are in; and, accordingly, shouldn’t this knowledge always be a starting point for language analysis rather than a late-stage supplement?” The answer, we believe, is “no”. Context is a far more fluid and quickly changing notion than this conjecture implies. For example, while making dinner, you can be talking with whomever is around not only about food preparation but about office gossip, the need to finally get a new dishwasher, and a recent phone message that you forgot to mention. In fact, people shift topics of conversation so fast and frequently that questions like, “Wait, what are we talking about?” are not unusual. The ubiquity of topic switching provides a theoretically-motivated reason to begin the process of NLU with more generic methods – since the given utterance might be introducing a new topic – and then invoke context-specific procedures if needed.
In addition to theoretical motivations, there are also practical motivations for starting with generic methods of NLU and progressing to domain-specific ones only if needed.
Ultimately, agents need to be able to operate at human level in all contexts, and it is important for developers to understand which human NLU capabilities are applicable across domains.
Domain-specific reasoning (apart from preferring certain word senses because they are attested in a script) tends to be more complicated, computationally expensive, and error-prone than the generic NLU described so far.
It can be difficult to automatically detect the intended context of an utterance since objects and events from many domains can be mentioned in a single breath. Moreover, it is not the case that every referring expression shifts the topic of conversation to the related script. For example, mentioning that the neighbor kid just hopped over the fence to grab his baseball does not mean that the overall conversation has switched to topic of baseball, and this comment should not prepare humans or agents to carry out full-on reasoning about that sport.
Although developers can, of course, assert that an agent will operate in a given domain throughout an application, associated successful experimental results would have to be judged in the light of this game-changing simplification.
Domain-specific methods are most naturally developed in bottom-up fashion, with the specific needs of specific agents in specific applications setting the agenda. This not only helps to prioritize R&D, it also provides material to test and evaluate the solutions. Of course, bottom-up does not imply stuck on the bottom – the goal is to generalize to the extent possible, thereby preparing agents to operate across domains.
One type of situational reasoning involves ontological scripts, which can be applied in two ways. In static script-based reasoning, the agent consults a recorded script as a source of knowledge while not participating (reasoning and making decisions about actions) in any related activity. For example, the use of the definite description the pilot in the sentence Our plane took off late because the pilot was caught in traffic is licensed because a pilot is a necessary actor in the script of air travel. Understanding the licensing of definite descriptions in so-called “bridging” constructions like these is a component of reference resolution that is carried out at Stage 4.
Here, we will concentrate on activated script-based reasoning, which requires that the agent be participating in an activity that plays out a known script. Since, at the current stage of system development, our agents only participate in one kind of activity at a time, they do not have to dynamically detect which scripts are activated. This is a simplification of real life. However, it is only a partial simplification for two reasons. First, since all language processing to this point has been domain neutral, the agent can talk and learn about many other things while it is carrying out its domain-specific tasks, so we are not constraining the agent to functioning only within a single domain. What we are saying is that the methods described here can help to concretize the interpretation of utterances only if they pertain to an activated script. The second reason why having a limited inventory of activated scripts is only a partial simplification is that the agent still needs to determine which point/aspect of which activated script (there could be several) is relevant and how it is relevant.
We are just beginning to explore activated script-based reasoning in service of NLU, so our descriptions of component phenomena are preliminary. The problems that can be addressed in this way involve unexpected syntax, residual lexical and referential ambiguity, residual incongruity, and residual underspecification.
Treating residual unexpected syntax by fishing
As a reminder, we define unexpected syntax as syntactic structures that do not result in a full, well-formed parse that is fully subsumed by the LEIA’s syn-mapping procedure (Section 4.1). Unexpectedness can reflect either the actual nature of the structure – i.e., any person would also consider it odd – or the less-than-human-level sophistication of the available syntactic analysis procedures. LEIAs have already attempted (cf. Section 4.3) to normalize unexpected syntax by stripping away repetitions, disfluencies, and false starts – a process that works if an input is mostly complete and canonical to begin with. However, when the input is more fractured, then the agent must shift gears from normalizing to fishing – i.e., engaging in a directed search for interpretable propositions.
Fishing involves extracting constituents that semantically fit together while ignoring others. It is useful, for example, when there is such an accumulation of superfluous categories – e.g., repetitions and false starts – that the simple stripping processes launched earlier fail to result in a well-formed structure. Our fishing algorithm first extracts NPs, then identifies
Treating residual ambiguities
Residual lexical and referential ambiguity can be treated in a straightforward manner by preferring analyses that align with expectations recorded in the activated script. A chair building script will include many instances of
As concerns pragmatic ambiguities, a common case involves the interpretation of speech acts. Recall that utterances that offer an indirect speech-act reading always offer a direct speech-act meaning as well. I need a hammer can mean Give me one or simply I need one – maybe I know that you are not in a position to give me one, or that we do not have one to begin with. During basic semantic analysis, the LEIA detected the availability of both interpretations for canonical indirect speech acts. It gave a scoring preference to the indirect reading, which was recorded as a phrasal sense in the lexicon, but the direct meaning remained available. Now it can use script-based reasoning to make the final decision. If the indirect interpretation is something that the agent can actually respond to – it is a question the LEIA can answer, or it is a request for action that the LEIA can fulfill – then that interpretation is selected. This would be the case, for example, if a chair-building LEIA were told, I need a hammer (a request: Give me a hammer). By contrast, if the LEIA cannot fulfill the request (I would love a sandwich right now) or does not know the answer to a question (I wonder if the Steelers won last night), then it chooses the direct interpretation.
Treating residual underspecifications
What does good mean? If a vague answer is enough – and it often is – then it indicates a positive evaluation of something. In fact, the independent statements, Good, Great, Excellent and others have lexical senses whose meaning is “the speaker is (highly) satisfied with the state of affairs”. This is important information for a task-oriented LEIA because it asserts that no reparative action need be taken. By contrast, What a mess and This looks awful indicate that the speaker is unhappy and that the LEIA should consider doing something.
However, there are cases in which a vague expression actually carries a more specific meaning: If I ask someone who knows me well to recommend a good restaurant, then I expect my location and preferences to be taken into account; if someone recommends a good student for a graduate program, that student had better be sharp, well-prepared, and diligent; and the features contributing to a good resume will be very different for people giving advice about resume structure preferences than for bosses seeking new hires.
For applications like recommendation systems, web-search engines, and product reviews, notions of good and bad span populations (they do not focus on individuals) and tend to generalize across features (e.g., a restaurant might get an overall rating of 3/5 even if the food is exceptional). By contrast, in agent applications the features of individuals – including their preferences, character traits, mental and physical states – can be key. For example, in the domain of clinical medicine, in which we have worked extensively (Nirenburg, McShane and Beale (2008) and McShane (2014)), the best treatment for a patient will depend on a range of factors that the LEIA knows about thanks to a combination of its model of clinical knowledge and the features it has learned about the individual during system operation (through dialog, simulated events, etc.). Clearly, this is all highly domain-dependent.
Treating residual fragments
At this point in processing, some fragments, particularly bare noun phrases, have not yet been incorporated into the contextual interpretation: e.g., Scalpel! Two lattes, no sugar. One generalization is that when a noun phrase functions as an independent utterance that does not fulfill the pragmatic need of a previous utterance (as by providing the answer to a question), it tends to mean “Give me [that object]”. This is just the tip of the iceberg in incorporating such fragments into the contextual interpretation – a topic we plan to explore bottom-up, using specific dialogs in specific agent applications.
Going deeper for unknown words
Up to this point, unknown words have been treated as follows. During syntactic analysis, they were provided with one or more candidate lexical senses that semantically treated them as underspecified
Let’s say the agent is operating in a furniture-building domain and receives the utterance Pass me the Phillips; and say that it does not have a lexical sense for Phillips, or even the full-form Phillips head screwdriver. The agent can narrow down the interpretation to objects that it is able to give to someone – under the assumption that its interlocutor is operating under sincerity conditions. If it understands which action the human is attempting to carry out – something that can be provided either by language (I’m trying to screw in this screw) or by sensors related to vision – it can narrow the interpretation still further.
Using one’s interlocutor
The layer of situational reasoning that we are now discussing presupposes that the LEIA is operating in a particular application along with a human collaborator. That human provides an interactive context, since the LEIA can ask for clarifications, as well as learn new words, concepts, and facts through dialog. We have explored two kinds of learning through interaction with human interlocutors: learning new words and ontological concepts by being told, a capability used by virtual patients in the Maryland Virtual Patient application (Nirenburg, McShane & Beale, 2008), and learning ontological scripts through language interaction while carrying out a task (Nirenburg et al., 2018).
Anchoring referring expressions in agent memory
Our style of NLU first and foremost supports the functioning of comprehensive artificial intelligent agents. In view of that, LEIAs store TMRs in their memory and then use the knowledge stored in memory – not text strings – as input for general reasoning and decision-making about action. Memory management is a big topic that includes determining whether referring expressions already have anchors in memory or require new ones; deciding whether to store all TMRs in memory or only those that the agent considers important; inferring generalizations from recurrent instances; modeling forgetting; and much more. Many of the issues of memory management involve reference-resolution: e.g., whereas the Irish mechanic and the German mechanic cannot refer to the same person, the blue bicycle and the red bicycle can be the same if the text says that the blue bicycle was painted red.
Large corpora as context
The final layer of context is the world of digital data, or whichever portions of it are available to a LEIA at runtime or for offline lifetime learning by reading (English & Nirenburg, 2007). Learning by reading is a big topic that we mention only in passing, for the sake of completeness. It is an absolutely necessary functionality of knowledge-based agent systems if they are to be feasible and scalable, but it requires careful attention to the bootstrapping process so that the automatically acquired knowledge does not corrupt the quality of the manually acquired core knowledge bases.
Broader issues
In this paper we have defined context very broadly, dividing it into horizontal context, which refers to the available elements of perceptual input, and vertical context, which refers to the knowledge bases and reasoning functions that are brought to bear, in sequence, during NLU. The LEIA chooses how to proceed through horizontal and vertical context using decision functions based on the nature of the input, its current plans and goals, etc.
Undertaking to solve the large and difficult scope of problems presented by natural language is not typical of modern-day natural language processing endeavors. Much more typical is work on a single task (e.g., knowledge extraction), a single phenomenon (coreference resolution according to the MUC-7 task specification (Hirschman & Chinchor, 1997)), or a particular statistical approach (e.g., supervised machine learning), no matter what data it is applied to. Although the latter types of work have been useful in support of near-term applications, we do not believe that they offer a natural progression toward human-level NLU for reasons we detail elsewhere (McShane (2017) and Nirenburg and McShane (2016)).
Our approach differs from mainstream NLP in many ways: it pursues deep, contextual semantic analysis; it integrates NLU within overall agent cognition and recognizes that only a holistic approach has any chance of ultimate success (i.e., splitting off NLU as an autonomous task is not a simplification leading to better results; on the contrary, it actually makes the task completely impossible); it embraces the scruffy nature of actual language use; and it does not pursue the unnecessary goal of arriving at full and complete analyses of every input. Our approach also distinguishes domain-independent knowledge and reasoning from domain-specific knowledge and reasoning, thereby both clarifying and maximizing what can be reused across agents and applications.
Of course, there are other cognitive systems that incorporate deep NLU, though typically with a much narrower coverage of linguistic phenomena, vocabulary, or ontology. Some robotic systems incorporate NLU as an ancillary functionality in their program of research. For example, the robotic system reported by Lindes and Laird (2016) implements a parser based on embodied construction grammar (Feldman, Dodge & Bryant, 2009), and the one reported by Baral, Lumpkin & Scheutz (2017) concentrates on translating a limited set of natural language utterances into a “Lambda calculus representation of words [that] could be inferred in an inverse manner from examples of sentences and their formal representation” (ibid: p. 11). In both systems, the role of the language component is to support a) direct human-robotic interaction, predominantly simple commands; and b) robotic learning of the meanings of words as the means of grounding linguistic expressions in the robot’s quite limited world model. If the ultimate goal is to develop robotic language understanding that approaches human-level sophistication, then the above systems will have to tackle a plethora of NLU issues that their current objectives allow them to postpone.
Other agent-development paradigms, such as the ACT-R cognitive architecture, have fostered work on various problems of NLU, such as pronoun resolution and word sense disambiguation (Ball, 2013; and Dutta and Basu, 2012). However, it is difficult to glean from reports about individual phenomena how and if they have converged into a single approach or system. This is, in part, a consequence of short-form publication genres, which impose a difficult task on developers of large systems (balancing explicit details with references to outside literature), and an equally high cognitive load on readers (trying to understand the cited details in the absence of all the necessary background).
A program of work that conceptually resonates with ours but has a relatively larger focus on planning than language is TRIPS/TRAINS (Ferguson and Allen, 1998; Allen et al. 2007). It has, however, given rise to computational linguistic analyses of various language phenomena, such as broad referring expressions (Byron, 2004) and grounding in dialog (Traum, 1994). However, it is not clear from the subsequent literature to what extent these approaches were incorporated into systems that followed.
Although this paper is about theory, not implementation, a few words about the latter are in order. The methodology of LEIA development involves proof-of-concept implementations under conditions that are sufficiently realistic to a) validate the component microtheories that treat individual phenomena and b) give us good reason to believe that the approach should scale up in sync with the growth of the knowledge resources. For example, although the lexicon at present contains only around 30,000 senses of words and phrases – not nearly enough to support NLU in all domains – it is sufficiently polysemous to require LEIAs to fundamentally treat lexical ambiguity. LEIAs, right now, are dealing with over 40 senses of the verb make (many of which are phrasals), over 20 senses of have, 18 for the preposition in, and so on. Similarly, our environment already fulfills all the prerequisites for taking on difficult problems like the interaction of lexical and referential ambiguity, unexpected input, learning new words on the fly, and detecting and reconstructing elided material. If our methods for treating such phenomena are shown to work for inputs covered by the current lexicon and ontology – which we have painstakingly prevented from containing the kinds of oversimplifications that would result in misleadingly impressive evaluations – then they should work equally well over larger versions of those resources. Since we have not conducted any formal evaluations of our most recent integrated system, we will make no claims about it. However, interested readers will find formal evaluations of select microtheories in past publications (McShane, Nirenburg and Beale, 2015, 2016; McShane, Beale and Babkin, 2014).
Footnotes
Acknowledgements
This research was supported in part by Grants N00014-16-1-2118 and N00014-17-1-2218 from the U.S. Office of Naval Research. Any opinions or findings expressed in this material are those of the authors and do not necessarily reflect the views of the Office of Naval Research.
