Qualitative descriptors applied to ambient intelligent systems

Abstract

In this paper, qualitative descriptors of images ( ${QIDL}^{+}$ ), 3D scenes ( $QSn 3 D$ ) and movements ( $QMD$ ) are proposed to be applied to ambient intelligent systems for: (i) improving human-computer interaction and (ii) enhancing their reasoning capabilities. These qualitative descriptors align with human perception and are used to produce narratives. Moreover, logic descriptions are obtained in first order logic and Prolog syntax, which enables straightforward reasoning capabilities. Finally, some use case tests carried out at Cartesium – the intelligent building at Spatial Cognition centre, at the University of Bremen – are provided to show the flexibility and applicability of these qualitative descriptors. The results provided are discussed in terms of usability, logic framework used and integration of descriptors.

Keywords

Qualitative descriptors logics narrative scene understanding Ambient Intelligence human-machine interaction

1. Introduction

Imagine the following scenarios. You are an 80-year-old modern ageing person who lives alone in your smart home. One morning during breakfast you cannot find your vitamins and you ask to your smart home: Where are my pills? Your smart home scans the house and finds them on your desk. It answers: Your pills are on your desk, next to your laptop, on the right (Scenario I). Moreover, you have also a service robot at home which helps you with the daily routine activities. It is the one getting your pills from your desk and giving them to you. It asks you: How can I further help you? And you provide the following instruction: Please, tidy the living room. To clarify, your robot asks back: Should the new stool go in front of the armchair or down the table? (Scenario II). And you answer: Just move it a bit towards your left and that’s it (Scenario III).

In Scenario I, the location of a target object is described using qualitative descriptors. In Scenario II, the location of a piece of furniture is described using the orientation of another piece of furniture. And in Scenario III, a movement is described also using qualitative descriptors. These scenarios illustrate the need of spatial intelligence and the usability of qualitative descriptors in human-machine communication contexts.

Ambient Assisted Living (AAL) applications need Ambient Intelligence (AmI) for: (i) scene understanding (i.e. to ‘know’ what is happening in a building); (ii) reasoning, to identify the consequences of what is happening and provide assistance if needed; and (iii) learning, to identify routine activities and ‘predict’ events by analogy with the past, and also identify uncommon or ‘new’ activities. Moreover, systems that must carry out a task in environments where people live or work need cognitive capabilities for enhancing human-machine communication. As Vernon [58] pointed out: ‘Cognition implies an ability to understand how the world around us might possibly be (…) and being able to interpret a visual scene without having complete data’. Therefore, a cognitive system should be able to describe and identify scenes without having complete information about them (i.e., it should be able to describe objects that have not seen before and identify them by the context).

A key issue in the study of Ambient Intelligence is reasoning about context to deduce new knowledge. The main challenges of this effort derive from the imperfect context information, and the dynamic and heterogeneous nature of the ambient environments [6]. Henricksen and Indulska [31] characterise four types of imperfect context information: unknown, ambiguous, imprecise, and erroneous. Sensor or connectivity failures result in situations where not all context data is available at any time. When the data about a context property comes from multiple sources, the context information may become ambiguous. Imprecision is common in sensor-derived information, while erroneous context information arises as a result of human or hardware errors. The role of reasoning in these cases is to detect possible errors, make predictions about missing values, and decide about the quality and the validity of the sensed data. The raw context data needs, then, to be transformed into meaningful information so that it can later be used in the application layer.

Qualitative Reasoning (QR) [28,33,59] is a field in AI. Instead of using precise numerical representations, qualitative models focus on representing relevant aspects in the domain, by discretising quantity spaces, and as a consequence, they deal with imprecise numerical values. Therefore, they can reason with non-exact data, ambiguous or incomplete. Qualitative Spatial and Temporal Representations and Reasoning (QSTR) [12,29,37] models and reasons about time (i.e. coincidence, order, concurrency, overlap, granularity) and also about properties of space (i.e. topology, location, direction, proximity, geometry, intersection, etc.) and their evolution in time between continuous neighbouring situations. Maintaining the consistency and constraints in space and time are the basics in qualitative reasoning when solving spatial problems. QSTR also reflects cognitive aspects about reasoning and talking about space, that is, they reflect human spatial cognition [48]. As a result, well-defined qualitative models and reasoning techniques have appeared in the literature and applied to many domains: robotics [20,35], computer vision [11,18,34], ambient intelligence [4,24], architecture and design [3], geographic information systems [1,27], education [7,14,16] etc.

In this paper, qualitative spatial representations have been used to built three qualitative descriptors: (i) a Qualitative Image Logic Descriptor ( ${QIDL}^{+}$ ), (ii) a Qualitative descriptor of 3D Scenes ( $QSn 3 D$ ) and (iii) a Qualitative Movement Descriptor ( $QMD$ ). These descriptors are linking QSTR with AmI in three applied scenarios.

The Qualitative Image Descriptor (QID) [22] has been able to extract qualitative knowledge from real images: location of objects in the image, their topological situation, their shape and colour. Logics have also been provided for the qualitative image descriptor (QIDL) [24]. This paper extends QIDL approach by including qualitative sizes of the objects as a new feature. None of the previous works in the literature integrate all these shape, colour, size, topology and location qualitative descriptors when producing first order logic predicates in Prolog for reasoning about images/scenes.

Cognitive studies can be found in the literature, which investigates how people describe object arrangements in the real space [55,56]. Some of the results obtained were applied to improve human-robot interaction [43,44], which used a robot incorporating a range laser sensor to extract information from the environment. Taking into account these previous works, the Qualitative descriptor of 3D Scenes ( $QSn 3 D$ ) apply the results of cognitive studies to the description of a 3D scene captured by an XBox Kinect device, which provides very rich spatial information about the space (i.e. information about depth, for distinguishing foreground from background, textures of the objects for identifying them, etc.).

The Qualitative Descriptor of Movement ( $QMD$ ) proposed in this paper is used to describe location and direction of moving objects in videos. Models of qualitative motion [8,46] and qualitative models of trajectories [41,57] have been presented in the literature before. The models of motion were theoretically proved to provide useful descriptions, but these descriptions were not formulated in first order logic and they have not applied to the description of real videos. Moreover, a detailed qualitative description of the physical properties of the motion or the trajectory of the objects in a video is not the objective of this paper since our main aim is not detecting collisions. In the current paper, from the data about movements that can be extracted from videos, the $QMD$ approach obtains knowledge/meaning that can be interpreted cognitively, that is, by people.

The rest of the paper is organised as follows. Section 2 introduces the qualitative descriptors for ambient intelligent systems proposed in this paper: the qualitative descriptor of images ( ${QIDL}^{+}$ ), 3D scenes ( $QSn 3 D$ ) and movements ( $QMD$ ). They were designed to be applied at Cartesium, the intelligent building at Spatial Cognition centre, at the University of Bremen. Section 3 presents the components of the ${QIDL}^{+}$ : qualitative shape descriptor, qualitative colour descriptor, qualitative size description and qualitative descriptors of topology and location. It also describes how some domain knowledge helps the AmI system to infer more features about the image and how a narrative description is provided when a target object is searched by the AmI system and found in an image. Section 4 describes the qualitative descriptor of 3D scenes ( $QSn 3 D$ ), how it identifies objects in point clouds and how its location and orientation are described in a narrative way changing the reference frame of the AmI system from deictic to intrinsic depending on the situation. Section 5 presents the qualitative descriptor of movement ( $QMD$ ) which provides the location and direction of moving objects in videos. In Section 6, a discussion regarding the tests carried out is provided. And finally, Section 7 presents conclusions and future work.

2. Qualitative descriptors for ambient intelligent systems: QIDL⁺, QSn3D, QMD

Fig. 1.

Qualitative Descriptors for Ambient Intelligent Systems.

In the previous section, several scenarios have been introduced. The qualitative descriptor which can provide the ground for your smart home to answer ‘where are your pills’ (Scenario I) is the Qualitative Image Logic Descriptor, ${QIDL}^{+}$ , described in Section 3. ${QIDL}^{+}$ is applied to describe logically and narratively real AmI scenes at Cartesium building at Universität Bremen.

The qualitative 3D scene descriptor ( $QSn 3 D$ ) identifies objects and their location and orientation in the point clouds. Then, it provides the needed spatial information for the robot to use concepts as ‘the front of the armchair’ (Scenario II).

The qualitative movement descriptor ( $QMD$ ) detects any moving object in a video and it obtains its location and direction. It can provide the ground for an intelligent system to understand instructions such as: ‘move it towards your left’ (Scenario III).

As Fig. 1 illustrates, the main aims of all these qualitative descriptors ( ${QIDL}^{+}$ , $QSn 3 D$ , $QMD$ ) are: (i) to improve the human-computer interaction applying cognitive linguistics; and (ii) to enhance the reasoning capabilities of ambient intelligence systems.

3. The extended Qualitative Image Logic Descriptor (QIDL⁺)

The $QID$ approach [22] extracts the relevant regions detected within a digital image and describes them qualitatively by its shape, colour, topology and location. The $QIDL$ [24] approach implemented logics for the description. Here, the ${QIDL}^{+}$ extends these logics and features to include also qualitative sizes, which shows is effectivity in scenes where the point of view of the observer and the camera is the same.

Fig. 2.

The extended Qualitative Image Logic Descriptor ( ${QIDL}^{+}$ ).

The ${QIDL}^{+}$ is aimed at describing the location of target objects with respect to known or ‘unknown’ objects in a scene. Known objects in the scene are identified by object detectors, while other object categories are inferred using its qualitative features. As Fig. 2 shows, the logic description provided is shown to be useful for reasoning about spatial locations, and the qualitative features obtained can be included in a narrative description for enhancing human-machine interaction.

The rest of this section is organised as follows. The qualitative descriptors of shape (QSD), colour (QCD), size (QSize), topology and location are described next. The following sections show logic definitions for inferencing new information about the context and how they are useful in a real office top desk scenario.

3.1. Qualitative Shape Description (QSD)

Each of the relevant points of a shape ( $P_{0}, P_{1}, \dots, P_{N}$ ) is described by a set of four features ⟨EC, A or TC, L, C⟩:

Edge Connection (EC) occurring at P, described as: {line_line, line_curve, curve_line, curve_curve, curvature_point};

Angle (A) at the relevant point P (which is a not a curvature_point) described by the qualitative tags: {very_acute, acute, right, obtuse, very_obtuse};

Type of Curvature (TC) at the relevant point P (which is a curvature_point) described qualitatively by the tags: {very_acute, acute, semicircular, plane, very_plane};

Compared Length (L) of the two edges connected by P, described qualitatively by: {much_shorter (msh), half_length (hl), a_bit_shorter (absh), similar_length (sl), a_bit_longer (abl), double_length (dl), much_longer (ml)};

Convexity (C) at the relevant point P, described as: {convex, concave}.

The corresponding reference systems ( ${EC}_{RS}$ , $A_{RS}$ , ${TC}_{RS}$ , $L_{RS}$ and $C_{RS}$ ) are defined and calibrated according to previous experimentation [17].

Thus, by using these descriptors, the complete shape of an object can be categorised as: {triangle, quadrilateral, square, pentagon, …, polygon}. Figure 3 presents an example of a QSD.

Fig. 3.

Example of shape described by QSD.

3.2. Qualitative Colour Description (QCD)

The Red, Green and Blue (RGB) colour channels of object pixels are translated into Hue, Saturation and Lightness (HSL) coordinates, and a reference system for colour naming is defined as: $QCRS = {uH, uS, uL, {QC}_{NAME 1..5}, {QC}_{INT 1..5}}$ where $uH$ is the unit of Hue; $uS$ is the unit of Saturation; $uL$ is the unit of Lightness; ${QC}_{INT 1..5}$ refers to the intervals of HSL coordinates associated with each colour as shown in Fig. 4; and ${QC}_{NAME 1..5}$ refers to the colour names as follows: $\begin{matrix} \begin{matrix} {QC}_{{NAME}_{1}} & = & {black, dark_grey, grey, \\ light_grey, white} \\ {QC}_{{NAME}_{2}} & = & {red, orange, yellow, green, \\ turquoise, blue, purple, pink} \\ {QC}_{{NAME}_{3}} & = & {pale_+ {QC}_{{NAME}_{2}}} \\ {QC}_{{NAME}_{4}} & = & {light_+ {QC}_{{NAME}_{2}}} \\ {QC}_{{NAME}_{5}} & = & {dark_+ {QC}_{{NAME}_{2}}} \end{matrix} \end{matrix}$

The ${QC}_{INT 1..5}$ were calibrated according to the vision system used in previous experiments [24].

Fig. 4.

Reference System for the Qualitative Colour Descriptor (QCD). The vertical axis contains the colours in the grey scale ( $G_{1} \dots G_{K}$ ) whereas the rainbow or prototype colours are located in the external central circle ( $R_{1} \dots R_{KR}$ ). Light colours are situated above, close to white, and dark colours are placed below, close to black.

3.3. Topological description

The topological situation in space (invariant under translation, rotation and scaling) of an object A with respect to (wrt) another object B (A wrt B) is described as: $\begin{matrix} \begin{matrix} T_{NAME} & = & {disjoint, touching, \\ completely_inside, container} . \end{matrix} \end{matrix}$

The $T_{NAME}$ determines if an object is completely_ inside another object or if it is its container. It defines also the neighbours of an object as all the other objects with the same container which can be (i) disjoint from the object, if they do not have any edge or vertex in common; (ii) or touching the object, if they have at least one vertex or edge in common or if the Euclidean distance between them is smaller than a certain threshold set by experimentation [22]. Figure 5 presents a graphical representation of these relations.

3.4. Location description

For obtaining the location of an object A wrt its container or the location of an object A wrt an object B, neighbour of A, the following Location relations are identified which divide the space into nine regions as shown in Fig. 6: $\begin{matrix} \begin{matrix} {Lo}_{NAME} & = & {up, down, left, right, up_left, \\ up_right, down_left, down_left, \\ centre} . \end{matrix} \end{matrix}$

Fig. 5.

Topological situations distinguished by the ${QIDL}^{+}$ .

The location of an object is determined by the union of all the locations obtained for each of the relevant points of the shape of the object ( ${P_{0}, P_{1}, \dots, P_{N}}$ ). The location of any object wrt the image is computed and also the location of any object wrt its touching neighbours.

Fig. 6.

Locations described by the ${QIDL}^{+}$ .

3.5. Qualitative Size descriptor (QSize)

In this paper, a qualitative size descriptor is proposed for ${QIDL}^{+}$ which is built according to a reference system which relates the size of an object with the size of an image (Fig. 7) and it is defined as: $\begin{array}{l} {QSize}_{RS} = {relationSize, {QSize}_{Label}, {QSize}_{Int}} \end{array}$ where $relationSize$ is the unit of reference defined as $relationSize = \frac{objectSize}{imageSize}$ ; ${QSize}_{Label}$ refers to the size labels; and ${QSize}_{Int}$ refers to the intervals associated with each label, which follows a geometric serial.

The chosen ${QSize}_{NAME}$ and ${QSize}_{INT}$ are: $\begin{matrix} \begin{matrix} {QSize}_{NAME} & = & {huge, large, very_big, big, \\ medium, quite_small, small, \\ very_small, tiny} \end{matrix} \end{matrix}$ $\begin{array}{l} {QSize}_{INT} = { & [1, \frac{1}{2}), [\frac{1}{2}, \frac{1}{2^{2}}), [\frac{1}{2^{2}}, \frac{1}{2^{3}}), \\ [\frac{1}{2^{3}}, \frac{1}{2^{4}}), [\frac{1}{2^{4}}, \frac{1}{2^{5}}), [\frac{1}{2^{5}}, \frac{1}{2^{6}}), \\ [\frac{1}{2^{6}}, \frac{1}{2^{7}}), [\frac{1}{2^{7}}, \frac{1}{2^{8}}), [\frac{1}{2^{8}}, 0]} \end{array}$

Fig. 7.

Reference system for the size descriptor used in ${QIDL}^{+}$ .

3.6. Logics generated for

{QIDL}^{+}

The ${QIDL}^{+}$ approach describes a complete image by a set of facts:

where n is the number of objects. In order to describe a digital image using the ${QIDL}^{+}$ logic descriptors, a first-order knowledge base (KB) can be built as a set of formulas in first order logic [30] constructed using four types of symbols (constants, variables, functions, and predicates). Variable symbols range over the objects in the domain (i.e. $Location$ range over left, right, up). Constant symbols can represent objects in the domain of interest (i.e., pills, etc.). Function symbols (i.e., touching) represent mappings from tuples of objects to objects. Predicates represent relations among objects in the domain (i.e., hasQCD, hasQSize).

Formulas are recursively built from atomic formulas using logical connectives and quantifiers. A formula is satisfiable if and only if there exists at least one world in which it is true. First-order KB are usually built using Horn clauses, which contains at most one positive literal. The Prolog programming language is based on Horn clause logic [38].

The ${QIDL}^{+}$ can be generated in first-order logic following the Prolog notation shown in Table 1.

Table 1
Logics facts extracted for the objects in the images based on the qualitative descriptors

$α_{1}$ QIDLogics $⊑ \forall$ Object ∈Image ∃QD

$α_{2}$ QD $⊑ \forall$ P ∈Object ∃hasQSDpoint(Object,P,xy(X,Y),qsd(EC_Label,ATC_Label,C_Label,L_Label)).

$α_{3}$ QD $⊑ \forall$ hasQSDcategory(Object,Name,Regularity,Convexity).

$α_{4}$ QD $⊑ \forall$ hasQCD(Object,colourPoint(xy(X,Y),rgb(R,G,B),hsl(H,S,L),QC _NAME1..5 )).

$α_{5}$ QD $⊑ \forall$ LoRS _Label (Object,Image).

$α_{6}$ QD $⊑ \forall$ completely_inside(Object,ObjectInside).

$α_{7}$ QD $⊑ \forall$ touching(Object,ObjectTouching,[LocationList]).

$α_{8}$ QD $⊑ \forall$ hasQSize(Object,RelationSize,QSize _Label ).

By using the previous Prolog predicates, the ${QIDL}^{+}$ approach allows query logic to retrieve information from the KB. For example, the following query can be asked to the system:

where the variables $Size$ , $Colour$ , $Object$ and $Location$ can take any value, that is, this query can be used with a particular size (i.e. small) and all the small objects in the scene are retrieved together with their colour, location and their name. It can also be asked using combinations of variables, for example:

which retrieves objects which are smallandpale yellow and provide their locations in the scene.

3.7. Domain knowledge in

{QIDL}^{+}

The context is reflected by the domain knowledge introduced in the ${QIDL}^{+}$ , which consists on:

Logic definitions to categorise objects based on their qualitative descriptors. For example, a wall can be defined as: $\begin{matrix} (1) & \begin{matrix} \forall X category (X, wall) \to \\ [has_QCD (X,_,_,_, white) \lor \\ has_QCD (X,_,_,_, light_grey)] \land \\ [up (X, image) \lor up_right (X, image) \lor \\ up_left (X, image)] \end{matrix} \end{matrix}$

And a postit can be defined using ${QIDL}^{+}$ as: $\begin{matrix} (2) & \begin{matrix} \forall X category (X, posit) \to \\ [has_QCD (X,_,_,_, yellow) \lor \\ has_QCD (X,_,_,_, light_yellow) \lor \\ has_QCD (X,_,_,_, pale_yellow)] \land \\ hasQSize (X,_, small) \end{matrix} \end{matrix}$

Images of target objects already ‘known’ by the system which are used to detect the reference objects in the scene. The properties of QSD, QCD, QSize, Topology and Location are inherited by these target objects after a matching of their features to a region identified by the ${QIDL}^{+}$ approach.

The category inferred can also be included in the query logic in order to retrieve information from the KB. Thus, queries as the following can be formulated:

where the variables $Size$ , $Colour$ , $Object$ and $Name$ can take any value, that is, this query can be formulated using an object name (i.e. wall) and all the objects categorised as wall are retrieved together with their colour, size and their object identifier:

In the same way, the following query can retrieve locations of any object:

For example, objects categorised as posits and located up can be retrieved by the following query:

Moreover, any location of any object can be retrieved by its name or by its object identifier (i.e. where is the postit?) by the following query:

3.8. Testing ${QIDL}^{+}$

Cartesium building at Universität Bremen incorporates intelligent door tags (computers) installed in the walls next to every office (see Fig. 8). This is a scenario suitable to obtain pictures of a daily living environment and to study different situations in ambient intelligence.

Fig. 8.

An office floor in the Cartesium building: arrows indicate intelligent door tags.

In this context, let us consider a picture taken at the Cartesium building, which may be obtained by the cameras incorporated at the door tags or by a robot incorporating a camera as a visual sensor. As Fig. 9 shows, the ${QIDL}^{+}$ approach presented extracts the qualitative descriptors from the input image applying a colour segmentation method [25] and then obtains the closed boundary of the relevant regions detected [22]. This process is automatic and it does not depend on the picture taken or the domain knowledge of the system. For each of the regions detected, qualitative descriptors of shape (QSD), colour (QCD), topology, location and size (QSize) are obtained, as previously described. From these descriptors, first order logics in Prolog syntax are obtained and combined with definitions of objects in the domain allowing to categorise regions previously unknown in the image (i.e. ‘wall’ or ‘posit’).

Target objects are provided to the system according to the task to do and they are detected by the Speeded-Up Robust Features (SURF) invariant descriptor [2] and the Fast Library for Approximate Nearest Neighbours (FLANN) detector [45]. In this scenario, the target objects are a laptop, a notebook, a mouse and some pills.

Fig. 9.

Outlook of the ${QIDL}^{+}$ approach presented.

As Fig. 9 shows, the target object pills is detected in the image by feature detectors and matched to the segmented object-57 which inherits all the qualitative characteristics of shape, colour, size, topology and location. According to the scene, the Prolog predicates obtained are the following:

Moreover, using the Prolog logic predicates in the KB and the testing platform Swi-Prolog1

SWI-Prolog: http://www.swi-prolog.org/.

[60] logic queries were solved, such as asking the location of an object (categorised or not) or asking all the objects located in a specific location. For example:

a) the following query finds out the location of the mouse as down_right:

b) the following query finds out all the objects categorised as postit’s and indicates also their size, colour, and location in the scene:

c) the following query indicates the size and colour of an object not categorised, such as the screen or object-16:

A context-free grammar, $G ({QIDL}^{+} N)$ [13], is defined to generate descriptions of scenes in natural language using the qualitative descriptors extracted by ${QIDL}^{+} N$ . As Fig. 9 shows, an excerpt of the corresponding narratives produced by $G ({QIDL}^{+} N)$ for our desktop scenario are the following: The pills (object-57) is next to a postit (object-52). They are located down, down-right with respect to (wrt) the laptop (object-19) and up-left wrt the mouse.

Thus, these narratives can be used by Cartesium smart building to answer questions in natural language asked by users, such as, the question ‘where are my pills?’ – described in Scenario I. Moreover, the logics generated provide also AmI system with spatial understanding of the real situation for further reasoning.

4. Qualitative 3D Scene Descriptor (QSn3D)

A first step for describing a 3D scene involves identifying objects and then describing their locations (Fig. 10). In this paper, the $QSn 3 D$ approach is proposed for detecting objects in 3D point clouds (see Section 4.1) and for describing their locations from an intrinsic or relative point of view (see Section 4.2).

Fig. 10.

A Qualitative 3D Scene Descriptor ( $QSn 3 D$ ).

4.1. 3D object recognition

The $QSn 3 D$ approach obtains the 3D point clouds in the scene and it proceeds in the following way:

the floor in the scene is extracted by applying a RANSAC-based segmentation (RANdom Sample And Consensus) [26];

an Euclidean Cluster Extraction is carried out in order to distinguish different objects. For each extracted cluster, two geometrical 3D-features are calculated:

the Viewpoint Feature Histogram (VFH) [49] which is scale invariant but viewpoint variant. The main idea of this feature is to obtain three distinct angles between two points, using the normal vectors and the viewpoint direction;

the Global Radius-based Surface Descriptor (GRSD) [39]. The basic idea of GRSD is to approximate 3D-objects by searching for best-fitting circles at each point.

For each type of object, a bunch of point clouds is obtained, recorded and labeled with the name of the object. These point clouds contain different orientations and scales of the objects. With these labeled feature vectors a SVM-model is trained, which is later used to classify extracted clusters using LIBSVM [10]. The result of this step is the identification and recognition of some target 3D objects categorised by a name.

4.2. Spatial references in natural language

In spatial expressions, projective terms refer to the idea that a spatial relationship is projected from an origin (position anchoring the view direction) to a relatum (a known object nearby) in order to specify the location of the intended object, called also the locatum [56]. This is done using lexical items such as front, back, left, right.

Fig. 11.

Two spatial configurations which can be described using the same natural language sentence using a relative reference system located at the office chair.

The employment of projective terms presupposes underlying conceptual reference systems, which are systematically categorised by [36] as relative versus intrinsic. In relative reference, a viewer specifies the location of an object relative to a relatum, as in The chair is in front of the table. Here, the relatum does not necessarily possess intrinsic sides, and the reference system consists of three different positions. In intrinsic reference systems, the role of the relatum coincides with the role of origin, which therefore needs to possess intrinsic sides, which then serve as basis for reference. In The table is in front of me, the speaker serves both as relatum and as origin, and her/his view direction determines the direction of front. For example, in the scene in Fig. 11, the following narrative might be used independently of the point of view of the observer: The rubbish bin is in front of the office chair. This is a particular case since the office chair is an oriented object which has a front and a back which can be used as our relative reference system for locating objects.

The $QSn 3 D$ approach generates two types of natural language narratives using the qualitative location descriptors regarding:

an deictic reference system located at the RGB-Depth camera from which the objects in the scene are described; and

a intrinsic or relative reference system between objects in the scene that have clear orientations, as for example, chairs, sofas, armchairs, etc.

To generate these kind of natural language descriptions, the coordinates from the 3D-data obtained from the point clouds are used. As Fig. 12 shows, the scene is divided into different regions to distinguish between the depth information or distance to the observer (i.e. foreground and background) and the locations of the objects in the horizontal plane. These descriptors are formalised as follows.

Fig. 12.

Model for dividing the space observed from a RGB-Depth camera.

The closeness of the observer to an object can be described using a Distance Reference System or $DRS = {m, D_{NAME}, D_{INT}}$ where, m indicates the unit of measurement of the distance (meters); $D_{LAB}$ refers to the set of labels for the distances; and $D_{INT}$ refers to the interval values related to each concept. The $D_{NAME}$ and $D_{INT}$ selected for $QSn 3 D$ are the following: $\begin{matrix} \begin{matrix} D_{NAME} & = & {foreground, background} \\ D_{INT} & = & {(0, d_{n}], (d_{n}, \infty]} \end{matrix} \end{matrix}$ where $d_{n}$ is the distance threshold used in a scene, which can be parameterised depending on the environment.

The location of an object can be obtained using a Horizontal Location Reference System or $HLoRS = {°, {HLo}_{NAME}, {HLo}_{INT}}$ where, degrees (°) indicate the unit of measurement of the angular location of the object with respect to a horizontal plane centred in the object; ${HLo}_{NAME}$ refers to the set of labels for the locations; and ${Lo}_{INT}$ refers to the values in degrees (°) related to each concept of location. The ${HLo}_{NAME}$ and ${HLo}_{INT}$ used in $QSn 3 D$ are the following: $\begin{matrix} \begin{matrix} {HLo}_{NAME} & = & {left, centre, right} \\ {HLo}_{INT} & = & {(0, a_{1}], (a_{1}, a_{2}], (a_{2}, 180]} \end{matrix} \end{matrix}$ where $a_{1}$ is the angular threshold used for distinguishing between left and centre, whereas $a_{2}$ is the angular threshold used for distinguishing between centre and right. Both can be parameterised depending on the environment.

The HLoRS and the DRS can be combined as follows: $\begin{matrix} \begin{matrix} {HLoD}_{{LAB}_{g}} & = & {{Lo}_{LAB} \times D_{LAB}} \\ = & {{foreground, background} \\ \times {left, centre, right}} \end{matrix} \end{matrix}$

Computationally, the space division depicted in Fig. 12 is used in $QSn 3 D$ where the z-axis represents the depth-information and the x-axis is the horizontal information delivered by the RGB-Depth camera. An object is computationally located in the background/foreground, if its z-value is higher/lower than a scene-specific threshold, $d_{n}$ , which is represented by the dashed line in Fig. 12. An object is computationally placed on the right, if its location is to the right of the dotted line on the right in Fig. 12, that is, if the object is located at an angular position included in the interval $(100, 180]$ . The rest of the horizontal location descriptors are explained similarly. Notice that the dotted lines are defined by an angular location at the origin, that is, where the RGB-Depth camera is placed.

Both reference systems used in $QSn 3 D$ , deictic and intrinsic to oriented objects, are computed by ${HLoD}_{RS}$ . The deictic locations are obtained by locating the ${LoD}_{RS}$ where the observer is placed (i.e. a robot with a RGB-Depth camera). And the intrinsic locations are obtained by locating the ${LoD}_{RS}$ at the 3D centre of oriented objects and matching its front to the object front.

Note that for spatial relations between two objects, the reference frame is located on the oriented object so that its front corresponds to the front of the reference system. For both configurations in Fig. 11, although the objects are on different positions considering the deictic reference frame, the intrinsic reference system will produce the same description because the chair has an oriented front side, which is taken as reference.

4.3. Testing QSn3D

The $QSn 3 D$ approach has been tested in an indoor environment using four pieces of furniture of different colours and sizes: an armchair, an office chair, a rubbish bin and a stool. As a result, a proof-of-concept is presented here. The sensor used for extracting the 3D point cloud from the given scene was a Microsoft XBox 360 Kinect. This RGB-depth sensor is based on a structured infrared-light system, which has a ranging limit of roughly 0.7 to 6 m distance, and is applicable in most indoor environments.

Experimental results [32] showed that the random error of depth measurement increases with increasing distance to the sensor, and ranges from a few millimetres up to about 4 cm at the maximum range of the sensor. Others [50] have modified this sensor to work at a much high accuracy retrieving depth fields of objects at an accuracy in the sub-millimetre range. However, in the work presented, the original depth accuracy provided by our Kinect sensor was enough, since the information extracted here is qualitative and it does not require exact accurate values and it can also deal with range of depth measurements.

Fig. 13.

Scenario with furniture (oriented and non-oriented objects) where QSn3D descriptor is applied.

The system presented is written in C++ and built upon the Robot Operating System (ROS) framework.2

http://www.ros.org.

In order to receive the 3D-data from the Kinect device we have used the openNI-driver,3

http://www.openni.org.

included in ROS. The Point Cloud Library (PCL) framework4

⁴

http://www.pointclouds.org.

included in ROS is used to extract features from the obtained point clouds. And for training the SVM-model with the labeled 3D-feature vectors extracted from the clusters, the SVM library is applied (LIBSVM [10]).

Figure 13(a) shows our scenario and Fig. 13(b) shows the result of the object recognition process. Note that the orientation of each object is also obtained and represented.

An excerpt of the $QSn 3 D$ logic description obtained for the oriented objects in the scene is the following:

The object0c is identified as the armchair, which is a kind of chair. The location of object0c with respect to the observer (from where the picture is taken) is left and background. Moreover the object0c is oriented to the right. There are 2 close objects: object1c which is an office chair located on the left wrt the armchair, and object3c, a stool located to the right wrt the armchair.

Regarding the non-oriented objects, the $QSn 3 D$ logic description obtained is the following:

The object2c is identified as the rubbish bin, which has no orientation. It is located in the foreground on the right. And it is close to objects object1c (an office chair) and object3c (the armchair).

The qualitative descriptors obtained by $QSn 3 D$ are used to generate descriptions of 3D real scenes in natural language. The structure of the content to produce is defined by a context-free grammar $G (QSn 3 D)$ [19] which has two options:

Starting the narrative by the biggest object as the most salient object. In our current example, the biggest object is the armchair, then $G (QSn 3 D)$ generates:

In the background there are two chairs. One of them is an armchair (oriented to the right). The armchair has an office chair on the left, a stool and a rubbish bin on right.

Starting the narrative by the object closest to the observer as the most salient object. In our current example, the closest object to the observes is the rubbish bin, then $G (QSn 3 D)$ generates:

In the foreground, there is a rubbish bin on the right. There is a stool on the left. In the background there are two chairs. One of them is an office chair (oriented to the left). The office chair has an armchair in the front.

These logics and narratives provide AmI systems and service robots with the grounds to produce and understand natural language instructions such us ‘the new stool goes in front of the armchair’ where oriented objects (i.e. armchair) are involved as described in Scenario II in the Introduction.

5. Qualitative Movement Descriptor (

QMD

)

A video can be defined as a set of digital images or frames (I) such as, $\begin{matrix} V = {I_{1}, I_{2}, \dots, I_{k}, \dots, I_{m}} \end{matrix}$ Let us define a Qualitative Video Descriptor as a set of the relevant images or frames ( $I^{'}$ ) where a qualitative spatial change has been detected in the situation of the moving objects, that is, $\begin{matrix} QVD = {I_{1}^{'}, I_{2}^{'}, \dots, I_{j}^{'}, \dots, I_{n}^{'} / n < m} \end{matrix}$ Let us define a qualitative movement descriptor ( $QMD$ ) for explaining the spatial changing situation of a moving object in a video as: $\begin{matrix} \begin{matrix} QMD (I_{j}^{'}, l_{j + 1}^{'}) \\ = {Object, Location, Direction, Time} \end{matrix} \end{matrix}$ Therefore, $QVD = {QMD (I_{1}^{'}, I_{2}^{'}), QMD (I_{2}^{'}, I_{3}^{'}), \dots, QMD (I_{j}^{'}, I_{j + 1}^{'}), \dots, QMD (I_{n - 1}^{'}, I_{n}^{'})}$ that is, a video ( $V_{m}$ ) is composed by m frames and it is described qualitatively ( ${QVD}_{n}$ ) by explaining the spatial situation of the objects at a time ( $Time$ ) in the n relevant frames in the video ( $I_{j}^{'}$ ) by enumerating their Location and moving Direction as Fig. 14 shows. These spatial features are described in the following Sections 5.1 and 5.2, respectively.

Fig. 14.

A Qualitative Movement Descriptor ( $QMD$ ).

5.1. Location of moving object

For the representation of location information, the reference system shown in Fig. 15 is used for obtaining the location of an object with respect to (wrt) the current frame. This Location of Movement Reference System ${Lom}_{RS}$ divides the space into the following qualitative regions: $\begin{matrix} \begin{matrix} {Lom}_{NAME} & = & {left, left - up, down - left, centre, \\ middle - up, middle - down, right, \\ right - up, down - right} . \end{matrix} \end{matrix}$

Fig. 15.

Location movement Reference System ( ${Lom}_{RS}$ ).

In order to obtain the location of an object wrt the current image frame, the coordinates of the boundary of the object are extracted and its qualitative location is obtained according to ${Lom}_{RS}$ . As Fig. 15 shows, the ${Lom}_{RS}$ is defined by the thresholds $x_{1}, x_{2}, y_{1}, y_{2}$ . As a baseline, the following values have been assigned to these thresholds: $x_{1} = frameLenght / 3$ ; $x_{2} = 2 \cdot frameLenght / 3$ ; $y_{1} = frameWidth / 3$ ; $y_{2} = 2 \cdot frameWidth / 3$ . However, they can be adapted to a specific scenario or situation.

5.2. Direction of movement

In order to obtain the direction of the movement of an object wrt the previous and current frame, the coordinates of the boundary of the object are obtained and the increasing or decreasing slopes of coordinate locations are calculated according to the Direction of Movement Reference System ( ${Dm}_{RS}$ ) in Fig. 16. This ${Dm}_{RS}$ defines the following qualitative concepts, respectively: $\begin{matrix} \begin{matrix} {Dm}_{RS} & = & {towards - right, towards - right - up, \\ towards - right - down, towards - left, \\ towards - left - up, towards - left - down, \\ towards - up, towards - down, stopped} . \end{matrix} \end{matrix}$

5.3. Logics for a Qualitative Movement Descriptor (QMD) for one moving object

Fig. 16.

Direction movement Reference System ( ${Dm}_{RS}$ ).

The movement of an object in a video can be described qualitatively by the QMD defined in the previous section as: $\begin{matrix} \begin{matrix} QMD (I_{j}^{'}, l_{j + 1}^{'}) \\ = {Object, Location, Direction, Time} \end{matrix} \end{matrix}$

$Location \in {Lom}_{RS} = {left, left - up, down - left, centre, middle - up, middle - down, right, right - up, down - right}$ .

$Direction \in {Dm}_{RS} = {towards - right, towards - left, towards - up, towards - down, towards - right - up, towards - right - down, towards - left - up, towards - left - down, stopped}$ .

$Time \in N$ .

In order to describe a video using spatial logics, a first-order knowledge base (KB) can be built as a set of formulas in first order logic [30] constructed using Prolog predicates as explained in previous sections. The $QMD$ can be described using a first-order Prolog predicate as:

Other first-order Prolog predicates can be defined to ask about semantics of movement, such as:

(i) where is an object at a time,

(ii) towards which direction is an object moving at a time,

and (iii) if an object is placed in a broader location which involves more than one region defined by the ${Lom}_{RS}$ :

Moreover, some domain knowledge can be added to the AmI system using Prolog predicates which are built using the features extracted by $QMD$ and the previous predicates defined. For example, the following predicate can identify when an object is falling or has fallen:

5.4. Testing

QMD

In order to track any moving object in any video, the Background Subtraction5

⁵
Tutorial on Background detection by Stankiewicz: http://mateuszstankiewicz.eu/?p=189.

approach by [62] is applied as implemented in the Open Computer Vision Library6

⁶

OpenCV: http://docs.opencv.org/modules/video/doc/motion_analysis_and_object_tracking.html.

(OpenCV). This method is very convenient because does not consider unwanted movements like waving of a flag or curtain. The motion has to involve a change of place in space.

A proof-of-concept has been implemented taken as input the video showing a moving ball.7

⁷

Download data from: https://sites.google.com/site/zfalomir/projects/cognitive-ami.

This video has

720 \times 420 pixels

per frame and each frame is around 120 Kbytes. Some examples of the frames obtained from the video are shown in Fig. 17.

Fig. 17.

Frames obtained from a video showing a racquetball play. The red circle indicates the tracked ball.

An excerpt of the QMD predicates produced corresponding to the frames showed in Fig. 17 are the following:

The predicates obtained indicate that: from 4–5, the ball is moving from the wall towards the left (from middle-down to left-down), and from 6–7, the ball is falling back from the wall towards the left (from the right to the left-down).

These logics provide AmI systems and service robots with the grounds to understand instructions such us ‘move it a bit towards your left’ as described in Scenario III in the Introduction.

6. Discussion

This section discusses how AmI systems can have a more cognitive interaction with people (Section 6.1), and the usability of qualitative descriptors, in particular of the ${QIDL}^{+}$ (Section 6.2). Section 6.3 also explains how AmI systems can use qualitative descriptors to obtain first order logics or description logics in open and closed world assumptions. It also introduces how to fuzzy logic descriptors can be produced from qualitative reference systems. Section 6.4 describes two theoretical methods to follow the integrate the ${QIDL}^{+}$ , $QSn 3 D$ and $QMD$ presented in this paper. The advantages and disadvantages of both methods are presented together with the advantages and disadvantages of such integration. Finally, Section 6.5 explains the importance of customising qualitative descriptors and combining them with activity descriptors to know the environmental context.

6.1. From digital AmI systems to cognitive human-interactive AmI systems

In computer vision, images of scenes are digitalised, that is, divided into pixels or points corresponding to 3 colour coordinates (i.e. RGB). In order to recognise objects inside digital images, these pixels can be: (i) split by a boundary (i.e. transforming the image into grey scale and analysing intensity transitions between the pixels [9]); (ii) brought together using a similarity measure (i.e. based on colour closeness or other features) [25]; (iii) matched to predefined pixels corresponding to objects know a priori (i.e. feature detectors such as SIFT, SURF, etc. see the work by Mikolajczyk et al. [42] for an overview). In 3D object recognition the problem is similar: a scene is represented as a set of points floating in the air, called point clouds. In order to recognise objects there, these points must be put together again by learning different views of the object using machine learning methods [19].

Because digital images represent visual data numerically, most image processing has been successfully carried out by applying mathematical techniques to obtain and describe image content. All these approaches succeeded in their tasks, but they need to produce and store in memory huge numerical descriptions that cannot be interpreted or given a meaning if a successful match is not found. A disadvantage of these methods is their requirement of a repository of all possible images of objects existing in a scenario for identification, because they lack the ability to describe any feature of an object that they have not seen before. Those methods try to recompose the continuity of the space lost in the digitalisation, since this continuity is important in order to detect/recognise objects to give them a name/meaning to communicate with humans.

An AmI system that can interact with a human using concepts/names of objects and their features (shape, colour, location, etc.) may learn which objects belongs to their house and their usual location. Similarly to how we teach children the name of things and their usual location, so that they are able to store them correctly. To achieve this goal, qualitative descriptions of scenes might be obtained and correctly used in a dialog with a human. Numerical descriptions are not useful in this task, since most humans might not understand location coordinates inside their kitchen or colour coordinates of a banana, for example. The work presented in this paper is a step towards achieving such a goal.

6.2. The usability of the ${QIDL}^{+}$

The Qualitative Image Description ( $QID$ ) approach [22] was successful in extracting qualitative knowledge from real digital images. Then Qualitative Image Similarity methods ( $QIDSim$ ) were proposed [15], which were successful in identifying simple objects (i.e. pieces for mosaic building [23]) or simple scenes (i.e. corners in a room for robot localisation [21]). Qualitative approaches for object description in digital images may be ambiguous when detecting complex objects in the real world, since these approaches use abstractions of features which sometimes may produce too general categorisations, but they can be complemented with spatial features (i.e. topology, distance, direction, location, etc.) for disambiguation through human interaction.

In this paper, the feature size was added to get an extended Qualitative Image Description Logic ( ${QIDL}^{+}$ ) approach, which enables the system to infer more object categories including size in the logic definitions. However, the size descriptor must be used carefully since the size of an object depend on the distance from where it is observed. So, for referring to size in a conversation with a human, the ${QIDL}^{+}$ system must have the same point of view as the human or a similar one. For example, the situations where QIDL was applied before [24] were taken from different points of view, therefore the feature size did not contribute in disambiguating object identification. However, in situations like those showed in this paper, where the points of view of the scenarios are similar, the size feature has showed effectivity in ${QIDL}^{+}$ by helping inferring object categories (i.e. post-it and wall).

Context information is introduced when providing images of a-priori-known objects or target objects, which can be detected using the SURF and FLANN feature object detectors. If object recognition is successful, then it contributes to the ${QIDL}^{+}$ approach which can name some of the objects in the scene. If the object recognition is not correctly obtained (i.e. the object has not enough texture features to match or the illumination conditions are not suitable), then the ${QIDL}^{+}$ approach can still categorise some objects using the qualitative descriptors or provide a ‘broad’ description of the object based on its shape, colour, size, location and topology or combinations of them (i.e. the small yellow object on the right).

6.3. From qualitative descriptors to logics

Qualitative spatial descriptors can be expressed in any logic form. This has been showed by previous works in which the QID was expressed using description logics [18] and used to infer knowledge from the camera of a mobile robot which moves through the corridors of a building. The QID has also been expressed using first order logics [24] to infer further knowledge in a closed indoor scenario in the Cartesium building at University of Bremen. Therefore, this generality of QID is extensible to other qualitative spatial descriptors. And the kind of logics used depend on if the environment is considered as a closed or open world, as it is discussed next.

Description logics are based on an open world assumption (OWA), that is, they assume they do not know all the objects belonging to a category, for example, in our smart home we can have different kinds of chairs, but these are not all the existing chairs in the world. The disadvantage of the OWA approach comes when counting operations are needed [18], such as: A chair has at least 3 legs. In order to reason, operations to close the world are needed which results in longer reasoning times. The OWA approach involves that an AmI system needs to know what is typically happening in all houses in the world and then learn what is particular for the house it is controlling. For intelligent systems, the OWA approach is suitable when dealing with affordances of objects, since a system may find out a new creative use of an object depending on its features [47], that is, the uses of things have endless possibilities.

In contrast, first order logics are based on a closed world assumption (CWA). Therefore, in order to reason, they assume that what is known is true, and what it is not known is false, what makes reasoning procedures quicker. For example, an AmI system may need only to know the rules applied for the house that is controlling, which are the ones that matter in its task. The total set of rules that can be applied in different houses all over the world (OWA) will not be considered important for monitoring a particular house (CWA). Thus, this involves that any AmI system is preferred to be able to customise its configuration to fulfil the particular needs of their users, rather than having a general or universal configuration. And this configuration must be done via human-machine interaction where dialogs using qualitative concepts will be used.

Some approaches which use Semantic Web-based representations to describe context and reasoning have been proposed in the literature [6]. They retrieve information from the context knowledge base, check if the available context data is consistent or derive implicit ontological knowledge, but they have some drawbacks in reasoning: they cannot deal with missing or ambiguous information (which is a common case in ambient environments) and they are not able to provide support for decision making. Some of these reasoning issues are due to the fact that ontology-based models are based on open world assumption (OWA) for reasoning and there is a need to close the world for solving inferences (i.e. regarding counting individuals) [18].

Qualitative descriptors can also be expressed in fuzzy logic (CWA) or fuzzy description logic (OWA) to represent concepts which lack well defined boundaries. The reference systems of the qualitative descriptors can be defined in terms of fuzzy sets, instead of interval sets, and thus a degree of uncertainty for the concepts can be calculated. This has been showed by the vague colour descriptor used in [40] which was inspired in the QCD used by the ${QIDL}^{+}$ outlined in this paper. This colour vague descriptor combined with other spatial vague descriptors have shown to provide more discriminative object referring expressions taking into account the context of communication than crisp descriptors [40]. Other fuzzy qualitative spatial descriptors can be found in the literature [52–54]. Fuzzy versions of ${QIDL}^{+}$ , $QSn 3 D$ and $QMD$ are intended as future work.

6.4. Integrating QIDL⁺, QSn3D and QMD

By integrating all the proposed qualitative descriptors, for each object in a scene, the following features may be retrieved:

its shape (QSD), its colour (QCD), its size (Qsize) and its static location and topology;

its distance with respect to the point of view of the observer (i.e. background or foreground) and its orientation (if it is an oriented object);

its direction and its location in a snapshot, if it is a moving object. If it is not a moving object, then it can be categorised as static object.

The integration of these qualitative descriptors can be done in two ways:

involves an integration at a logic level, that is, adding all the logic facts obtained from the qualitative descriptors to the AmI agent knowledge base. Then checking the consistency of the descriptors and, after that, reason with them to infer new knowledge;

involves an integration at a sensory level, that is, using the RGB image taken by the Kinect and getting the ${QIDL}^{+}$ directly from it, while extracting the depth corresponding to that scene from the point cloud obtained by RGB-Depth camera (getting $QSn 3 D$ ). From the video, the moving objects can be detected (together with their $QMD$ ) and thus their texture features can be obtained from a snapshot. Using these features, the moving objects can be detected in the RGB image and their $QMD$ location with their ${QIDL}^{+}$ location can be integrated.

Method-A has the advantage of extracting the object features more easily and quickly, since the integration part is left for the knowledge base (KB) in the consistency checking process. The disadvantage is that some inconsistencies in the KB might need to come back to the sensory level for discarding errors.

Method-B has the advantage of solving the conflicting facts at a sensory level. For that, the same reference systems must be used for the location of objects in videos and images, for example. This will create less conflicting facts in the AmI agent’s KB. However, if a conflicting sensory situation is produced, then no fact might be written in the KB, and this is a disadvantage.

The integration of all these descriptors may enhance the AmI agent’s ability to differentiate moving objects from static objects and also to detect 3D topological situations, as for example, partially overlapping objects. However, note that sometimes, having too much information about the environment, may decrease the agent’s effectivity, since the task/activity to carry out can be delayed while extracting not needed descriptors. Therefore, the integration challenge is interesting, but it must be also defined when the descriptors must be integrated and when it is sufficient to use them in a separate manner.

Note also that both Method-A and Method-B are theoretical and they are intended as future work.

6.5. Combining QIDL⁺, QSn3D and QMD with qualitative activity descriptors

In order to provide the right information to the users at the right time and in the right place, an ambient intelligent system must ‘understand’ its environment, users’ needs/preferences and the tasks and activities that are being undertaken.

In this paper, the ${QIDL}^{+}$ , $QSn 3 D$ and $QMD$ can help AmI systems to ‘understand’ its environment and to talk about it in a cognitive manner. Moreover, the qualitative descriptors used can be also customised to users’ preferences, as showed for qualitative colour descriptors [51].

Finally, the qualitative descriptors presented in this paper can also be combined logically with qualitative activity descriptors [5] in order to know the context of operation (i.e. task/activity which is carried out by the user) and then to select a proper descriptor or the integration of all of them. This is a challenge intended as future work.

7. Conclusions and future work

The main aim of those qualitative descriptors ( ${QIDL}^{+}$ , $QSn 3 D$ , $QMD$ ) is to: (i) improve human-computer interaction and (ii) enhance the reasoning capabilities of intelligence systems.

It is important to notice that none of the previous works in the literature integrate all the shape, colour, size, topology and location qualitative descriptors as ${QIDL}^{+}$ does when producing first order logic predicates in Prolog for reasoning about images/scenes. Regarding $QSn 3 D$ , none of the works in the literature manage raw data extracted from real 3D scenes using RGB-Depth cameras and apply the qualitative spatial descriptors presented by $QSn 3 D$ . Regarding $QMD$ , none of the previous works in the literature provide first order logics describing object movement from real videos using the qualitative descriptors proposed by $QMD$ . Previous works in the literature were more focused on describing the physical properties of the motion or the trajectory of the objects in a video, but this is not the objective of this paper since our main aim is not detecting collisions, but to describe movement using cognitive terms that are understandable by users.

The experimentations presented showed the usefulness of the 3 descriptors presented: ${QIDL}^{+}$ , $QSn 3 D$ and $QMD$ . The experiences extracted after applying ${QIDL}^{+}$ , $QSn 3 D$ , $QMD$ to the AmI scenarios mentioned in this paper are the following. First, computer vision methods were applied, specifically: (i) colour segmentation in digital images to detect objects ( ${QIDL}^{+}$ ); (ii) RANSAC-based segmentation, VFH and GRSD combined with a SVM to detect objects in point clouds ( $QSn 3 D$ ), and (iii) subtraction of background to detect movement in videos ( $QMD$ ). The information obtained by these computer vision methods was usually approximate or imperfect. However, the qualitative descriptors dealt with imprecise, incomplete and imperfect knowledge on a symbolic level, since they are defined on approximate range of values. Moreover, the information obtained by computer vision methods was numerically too detailed and not easy interpretable by users. However, qualitative descriptors provided symbol-grounding [61] that allows cognitive concepts to be aligned with human perception, so that the obtained descriptors have the same meaning for both, systems and users. Since the qualitative concepts are aligned with human perception, they can be easily translated to generate narratives for enhancing machine-user communication. We have also experienced that logic definitions can be built using the qualitative descriptors proposed in this paper allowing the AmI systems to have a ground for further building statements depending on the context. Finally, we have experienced differences in the way users interpret concepts, this is why we have used qualitative descriptors which can be customised to adapt to them to the users. This shows the flexibility of ${QIDL}^{+}$ , $QSn 3 D$ , $QMD$ which opens the way to many future applications.

As future work, we intend to: (i) integrate the ${QIDL}^{+}$ , $QSn 3 D$ and $QMD$ to obtain joint descriptions in AmI environments; (ii) define fuzzy or vague versions of the ${QIDL}^{+}$ , $QSn 3 D$ and $QMD$ and to compare their performance with those presented in this paper; (iii) combine logically the ${QIDL}^{+}$ , $QSn 3 D$ and $QMD$ with qualitative activity descriptors to understand not only what is happening in the environment, but also what activities are the users carrying out and in which tasks they might need help; (iv) carry out a validation test in which people can grade the descriptions provided, and enhance the cognitive adequacy of the logics/narratives by taking into account the results of that validation test.

Footnotes

Acknowledgements

This work was conducted on the scope of 2 projects: (i) COGNITIVE-AMI8

⁸

Cognitive-AmI: https://sites.google.com/site/cognitiveami/.

(GA 328763) funded by the European Commission through FP7 Marie Curie IEF actions and (ii) project Cognitive Qualitative Descriptions and Applications

⁹

CogQDA: https://sites.google.com/site/cogqda/.

(CogQDA) funded by the Central Research Development Fund (CRDF) at Universität Bremen through the 04-Independent Projects for Postdocs action.

References

Al-Salman, Qualitative spatial query processing: Towards cognitive geographic information systems, PhD thesis, University of Bremen, 2014, Supervised by Prof. Christian Freksa (University of Bremen) and Prof. Christian Jensen (Aalborg University).

Bay,

Ess,

Tuytelaars and

Van Gool, Speeded-up robust features (SURF), Comput. Vis. Image Underst.110(3) (2008), 346–359. doi:10.1016/j.cviu.2007.09.014.

Bhatt and

Freksa, Spatial computing for design, an artificial intelligence perspective, in: Studying Visual and Spatial Reasoning for Design Creativity,

J.S.

Gero, ed., Springer, 2015, pp. 109–127.

Bhatt,

Guesgen and

J.C.

Augusto (eds), in: Proc. of the Workshop on Space, Time and Ambient Intelligence (STAMI 2011), Pasadena, California, USA, 2011, International Joint Conference on Artificial Intelligence (IJCAI 2011).

Bhatt,

Guesgen and

Cook (eds), in: Proc. of the Workshop: Space, Time, and Ambient Intelligence: Spatio-Temporal Aspects of Human-Activity Interpretation (STAMI 2013), Washington USA, 2013, 27th AAAI Conference.

Bikakis,

Patkos,

Antoniou and

Plexousakis, A survey of semantics-based approaches for context reasoning in ambient intelligence, in: Constructing Ambient Intelligence,

Mühlhäuseret al., eds, Communications in Computer and Information Science, Vol. 11, Springer, 2008, pp. 14–23. doi:10.1007/978-3-540-85379-4_3.

Bredeweg and

K.D.

Forbus, Qualitative modeling in education, AI Magazine24(4) (2004), 35–46.

Burger and

Bhanu, Qualitative motion understanding, in: Proc. of the 10th IJCAI, Milan, Italy, 1987, pp. 819–821.

J.F.

Canny, A computational approach to edge detection, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)8 (1986), 679–697. doi:10.1109/TPAMI.1986.4767851.

10.

C.-C.

Chang and

C.-J.

Lin, LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology2 (2011), 27.

11.

Cohn,

Hogg,

Bennett,

Devin,

Galata,

Magee,

Needham and

Santos, Cognitive vision: Integrating symbolic qualitative representations with computer vision, in: Cognitive Vision Systems,

Christensen and

H.-H.

Nagel, eds, Lecture Notes in Computer Science, Vol. 3948, Springer, Berlin/Heidelberg, 2006, pp. 221–246. doi:10.1007/11414353_14.

12.

A.G.

Cohn and

Renz, Qualitative Spatial Reasoning, Handbook of Knowledge Representation, Elsevier, Wiley-ISTE, London, 2007.

13.

Falomir, A qualitative image descriptor QIDL+N to obtain logics and narratives applied to ambient intelligent systems, in: State of the Art on AI Applied to Ambient Intelligence,

J.C.

Augusto,

Aztiria and

Orlandini, eds, Frontiers in Artificial Intelligence and Applications, IOSPress, 2016, in press.

14.

Falomir, Towards a qualitative descriptor for paper folding reasoning, in: Proc. of the 29th International Workshop on Qualitative Reasoning, 2016, Co-located with IJCAI’2016 in New York, USA.

15.

Falomir, Qualitative distances and qualitative description of images for indoor scene description and recognition in robotics, PhD thesis, Universitat Jaume I (Spain) and Universität Bremen (Germany), November 2011, http://www.tdx.cat/handle/10803/52897.

16.

Falomir, A qualitative model for reasoning about 3D objects using depth and different perspectives, in: LQMR 2015 Workshop,

Lechowskiet al., eds, Annals of Computer Science and Information Systems, Vol. 7, PTI, 2015, pp. 3–11. doi:10.15439/2015F370.

17.

Falomir,

Gonzalez-Abril,

Museros and

Ortega, Measures of similarity between objects from a qualitative shape description, Spatial Cognition and Computation13 (2013), 181–218. doi:10.1080/13875868.2012.700463.

18.

Falomir,

Jiménez-Ruiz,

M.T.

Escrig and

Museros, Describing images using qualitative models and description logics, Spat. Cogn. Comput.11(1) (2011), 45–74.

19.

Falomir and

Kluth, Obtaining qualitative spatial logic descriptors from 3D indoor scenes and generating explanations in natural language, in: Cognitive Processing, 2017, page in press.

20.

Falomir,

Museros,

Castelló and

Gonzalez-Abril, Qualitative distances and qualitative image descriptions for representing indoor scenes in robotics, Pattern Recognition Letters38 (2013), 731–743. doi:10.1016/j.patrec.2012.08.012.

21.

Falomir,

Museros and

Gonzalez-Abril, Towards a similarity between qualitative image descriptions for comparing real scenes, in: Qualitative Representations for Robots, Proc. AAAI Spring Symposium, 2014, pp. 42–49, Technical Report SS-14-06.

22.

Falomir,

Museros,

Gonzalez-Abril,

M.T.

Escrig and

J.A.

Ortega, A model for qualitative description of images based on visual and spatial features, Comput. Vis. Image Underst.116 (2012), 698–714. doi:10.1016/j.cviu.2012.01.007.

23.

Falomir,

Museros,

Gonzalez-Abril and

Velasco, Measures of similarity between qualitative descriptions of shape, colour and size applied to mosaic assembling, J. Vis. Commun. Image R24 (2013), 388–396. doi:10.1016/j.jvcir.2013.01.013.

24.

Falomir and

A.-M.

Olteţeanu, Logics based on qualitative descriptors for scene understanding, Neurocomputing161 (2015), 3–16. doi:10.1016/j.neucom.2015.01.074.

25.

P.F.

Felzenszwalb and

D.P.

Huttenlocher, Efficient graph-based image segmentation, Int. J. Comput. Vis.59(2) (2004), 167–181. doi:10.1023/B:VISI.0000022288.19776.77.

26.

M.A.

Fischler and

R.C.

Bolles, Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, Commun. ACM24(6) (1981), 381–395. doi:10.1145/358669.358692.

27.

Fogliaroni, Qualitative Spatial Configuration Queries. Towards Next Generation Access Methods for GIS, Dissertations in Geographic Information Science, Vol. 9, IOS Press, 2013, ISBN 978-1614992486.

28.

K.D.

Forbus, Qualitative modeling, in: Handbook of Knowledge Representation,

van Harmelen,

Lifschity and

Porter, eds, Elsevier, 2008, pp. 361–393. doi:10.1016/S1574-6526(07)03009-X.

29.

Freksa, Qualitative Spatial Reasoning, Springer, Netherlands, Dordrecht, 1991, pp. 361–372.

30.

M.R.

Genesereth and

N.J.

Nilsson, Logical Foundations of Artificial Intelligence, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1987.

31.

Henricksen and

Indulska, Modelling and using imperfect context information, in: Proc. of 2nd IEEE Annual Conf. on Pervasive Computing and Communications, PERCOMW’04, IEEE Computer Society, Washington, DC, USA, 2004, p. 33.

32.

Khoshelham and

S.O.

Elberink, Accuracy and resolution of Kinect depth data for indoor mapping applications, Sensors12(2) (2012), 1437–1454. doi:10.3390/s120201437.

33.

Kuipers, Qualitative Reasoning – Modeling and Simulation with Incomplete Knowledge, MIT Press, 1994.

34.

Kunze,

Burbridge,

Alberti,

Tippur,

Folkesson,

Jensfelt and

Hawes, Combining top-down spatial reasoning and bottom-up object class recognition for scene understanding, in: Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’14), Chicago, Illinois, US, 2014.

35.

Kunze,

Burbridge and

Hawes, Bootstrapping probabilistic models of qualitative spatial relations for active visual object search, in: Qualitative Representations for Robots, Proc. AAAI Spring Symposium, 2014, pp. 80–81, Technical Report SS-14-06, ISBN 978-1-57735-646-2.

36.

Levinson, Space in Language and Cognition: Explorations in Cognitive Diversity, Cambridge University Press, 2003.

37.

Ligozat, Qualitative Spatial and Temporal Reasoning, MIT Press, Wiley-ISTE, London, 2011.

38.

J.W.

Lloyd, Foundations of Logic Programming. Symbolic Computation: Artificial Intelligence, 2nd, extended edn, Springer-Verlag, 1987.

39.

Z.-C.

Marton,

Pangercic,

R.B.

Rusu,

Holzbach and

Beetz, Hierarchical object geometric categorization and appearance classification for mobile manipulation, in: Humanoid Robots, 10th IEEE-RAS International Conference on, IEEE, 2010, pp. 365–370.

40.

Mast,

Falomir and

Wolter, Probabilistic reference and grounding with PRAGR for dialogues with robots, Journal of Experimental & Theoretical Artificial Intelligence28(5) (2016), 889–911. doi:10.1080/0952813X.2016.1154611.

41.

Mavridis,

Bellotto,

Iliopoulos and

Van de Weghe, QTC3D: extending the qualitative trajectory calculus to three dimensions, Computing Research Repository (CoRR), 2014, abs/1402.3779.

42.

Mikolajczyk,

Tuytelaars,

Schmid,

Zisserman,

Matas,

Schaffalitzky,

Kadir and

Gool, A comparison of affine region detectors, Int. J. Comput. Vis.65(2) (2005), 43–72. doi:10.1007/s11263-005-3848-x.

43.

Moratz and

Tenbrink, Spatial reference in linguistic human-robot interaction: Iterative, empirically supported development of a model of projective relations, Spatial Cognition and Computation6(1) (2006), 63–106. doi:10.1207/s15427633scc0601_3.

44.

Moratz and

Tenbrink, Affordance-based human-robot interaction, in: Proc. of the 2006 International Conference on Towards Affordance-Based Robot Control, Springer-Verlag, Berlin, Heidelberg, 2008, pp. 63–76.

45.

Muja and

D.G.

Lowe, Fast approximate nearest neighbors with automatic algorithm configuration, in: VISAPP Int. Conf. on Computer Vision Theory and Applications, 2009, pp. 331–340.

46.

Musto,

Stein,

Eisenkolb,

Röfer,

Brauer and

Schill, From motion observation to qualitative motion representation, in: Spatial Cognition,

Freksa,

Brauer,

Habel and

K.F.

Wender, eds, LN in Computer Science, Vol. 1849, Springer, 2000, pp. 115–126.

47.

A.-M.

Olteteanu and

Falomir, Object replacement and object composition in a creative cognitive system. A computational counterpart of the alternative use test, Cognitive Systems Research39 (2016), 15–32. doi:10.1016/j.cogsys.2015.12.011.

48.

Ragni,

Barkowsky,

Nebel and

Freksa, Cognitive space and spatial cognition: The SFB/TR 8 spatial cognition, KI30(1) (2016), 83–88.

49.

R.B.

Rusu,

Bradski,

Thibaux and

Hsu, Fast 3D recognition and pose using the viewpoint feature histogram, in: Proc. of the 23rd IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Taiwan, 2010.

50.

Ruther,

Lenz and

H.B.

Nect, On using a gaming RGBD camera in micro-metrology applications, in: Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on, 2011, pp. 52–59.

51.

Sanz,

Museros,

Falomir and

Gonzalez-Abril, Customizing a qualitative colour description for adaptability and usability, Pattern Recognition Letters61 (2015), 2–10. doi:10.1016/j.patrec.2015.06.014.

52.

Schockaert,

De Cock,

Cornelis and

E.E.

Kerre, Fuzzy region connection calculus: Representing vague topological information, Int. J. Approx. Reasoning48(1) (2008), 314–331. doi:10.1016/j.ijar.2007.10.001.

53.

Schockaert,

De Cock and

E.E.

Kerre, Fuzzifying Allen’s temporal interval relations, IEEE Trans. Fuzzy Systems16(2) (2008), 517–533. doi:10.1109/TFUZZ.2007.895960.

54.

Schockaert,

De Cock and

E.E.

Kerre, Modelling nearness and cardinal directions between fuzzy regions, in: FUZZ-IEEE 2008, IEEE International Conference on Fuzzy Systems, Proc, 2008, pp. 1548–1555.

55.

Tenbrink,

K.R.

Coventry and

Andonova, Spatial strategies in the description of complex configurations, Discourse Processes48(4) (2011), 237–266. doi:10.1080/0163853X.2010.549452.

56.

Tenbrink,

Maiseyenka and

Moratz, Spatial reference in simulated human-robot interaction involving intrinsically oriented objects, in: Symposium Spatial Reasoning and Communication at AISB’07 Artificial and Ambient Intelligence, Vol. 7, 2007.

57.

Van de Weghe,

Cohn and

De Maeyer, A qualitative representation of trajectory pairs, in: Proc. 16th European Conf. on Artificial Intelligence, ECAI, Valencia, Spain, 2004, pp. 1103–1104.

58.

Vernon, Image and vision computing special issue on cognitive vision, Image and Vision Computing26 (2008), 1–4. doi:10.1016/j.imavis.2007.09.003.

59.

D.S.

Weld and

de Kleer (eds), Readings in Qualitative Reasoning About Physical Systems, Morgan Kaufmann, 1990.

60.

Wielemaker,

Schrijvers,

Triska and

Lager, SWI-prolog, Theory and Practice of Logic Programming (TPLP)12(1–2) (2012), 67–96. doi:10.1017/S1471068411000494.

61.

M.-A.

Williams, Representation = grounded information, in: PRICAI 2008: Trends in Artificial Intelligence: 10th Pacific Rim Int. Conf. on Artificial Intelligence, Proc,

T.-B.

Ho and

Z.-H.

Zhou, eds, Springer, Berlin, Heidelberg, 2008, pp. 473–484. doi:10.1007/978-3-540-89197-0_44.

62.

Zivkovic, Improved adaptive Gaussian mixture model for background subtraction, in: Pattern Recognition, 2004, ICPR 2004, Proc. of the 17th International Conference on, Vol. 2, IEEE, 2004, pp. 28–31. doi:10.1109/ICPR.2004.1333992.

$α_{1}$	QIDLogics $⊑ \forall$ Object ∈Image ∃QD
$α_{2}$	QD $⊑ \forall$ P ∈Object ∃hasQSDpoint(Object,P,xy(X,Y),qsd(EC_Label,ATC_Label,C_Label,L_Label)).
$α_{3}$	QD $⊑ \forall$ hasQSDcategory(Object,Name,Regularity,Convexity).
$α_{4}$	QD $⊑ \forall$ hasQCD(Object,colourPoint(xy(X,Y),rgb(R,G,B),hsl(H,S,L),QC _NAME1..5 )).
$α_{5}$	QD $⊑ \forall$ LoRS _Label (Object,Image).
$α_{6}$	QD $⊑ \forall$ completely_inside(Object,ObjectInside).
$α_{7}$	QD $⊑ \forall$ touching(Object,ObjectTouching,[LocationList]).
$α_{8}$	QD $⊑ \forall$ hasQSize(Object,RelationSize,QSize _Label ).

Qualitative descriptors applied to ambient intelligent systems

Abstract

Keywords

1. Introduction

2. Qualitative descriptors for ambient intelligent systems: QIDL+, QSn3D, QMD

3.4. Location description

3.8. Testing QIDL +

4.2. Spatial references in natural language

5.3. Logics for a Qualitative Movement Descriptor (QMD) for one moving object

5 Tutorial on Background detection by Stankiewicz: http://mateuszstankiewicz.eu/?p=189.

6.1. From digital AmI systems to cognitive human-interactive AmI systems

6.2. The usability of the QIDL +

6.3. From qualitative descriptors to logics

6.4. Integrating QIDL+, QSn3D and QMD

6.5. Combining QIDL+, QSn3D and QMD with qualitative activity descriptors

7. Conclusions and future work

Footnotes

Acknowledgements

References

2. Qualitative descriptors for ambient intelligent systems: QIDL⁺, QSn3D, QMD

3.8. Testing ${QIDL}^{+}$

⁵
Tutorial on Background detection by Stankiewicz: http://mateuszstankiewicz.eu/?p=189.

6.2. The usability of the ${QIDL}^{+}$

6.4. Integrating QIDL⁺, QSn3D and QMD

6.5. Combining QIDL⁺, QSn3D and QMD with qualitative activity descriptors