A novel factored POMDP model for affective dialogue management

Abstract

Partially observable Markov decision process (POMDP) model has been demonstrated many times to be suited for robust spoken dialogue management. Recently, some factored representations of POMDP model are designed for specific dialogue tasks. This paper proposes a novel factored POMDP model to describe a new application of affective dialogue management. Different from existing models, the user’s state space and the system’s observation space are both divided into two distinct components: goal and emotion. Moreover, the system’s action space is for the first time factored into two parts, i.e., goal response and emotion response, and the reward function is accordingly updated by weighted sum of the two-part rewards. An example of intelligent music player is given to explain how to apply the new model to build an affective dialogue system. Four experiments are designed to reveal the influence of key parameters on the system performance. The simulation results demonstrate the rationality and feasibility of the proposed model.

Keywords

Dialogue management POMDP model affective computing spoken dialogue system

1 Introduction

Spoken dialogue systems have been designed and developed for human computer interaction in 1990s [1, 2], which attempt to provide a convenient inquiry service for users through a dialogue. In the last decade, how to make the machine capable of emotional interaction with human has become a new and challenging subject in the human computer interaction field [3]. A new concept of affective computing was firstly proposed by Picard [4], which aims to enable the computer to perceive, recognize and understand human emotions, and to make an intelligent, sensitive, friendly response to human emotions. Under this background, affective dialogue system, which can perceive user’s emotion, and then make an appropriate response accordingly, has also been under consideration [5, 6], in order to respond to the requirement of many application areas, such as nursing home robots and intelligent tutoring system [7].

To date, most researches of affective computing mainly focus on detecting and recognizing user’s emotional state with various modalities [8], such as speech, facial expressions, posture, text, etc. [9 –12]. Correspondingly, many researches related to affective dialogue system are also concentrated on improving accuracy of speech emotion recognition in spoken dialogue system [13, 14], and generating the emotional expression of robots [15]. However, these works, which are necessary for emotional interaction, are not enough to build a useful speech based affective dialogue system, as shown in Fig. 1. As is known to all, dialogue management module plays a key role in the control policy, which contributes directly to the system performance [16]. A new kind of affective dialogue management mechanism is indispensable to connect the emotional recognition and emotional expression module.

Partially observable Markov decision process (POMDP) models have recently drawn significant interest in theoretical study of dialogue management due to the robust performance [17 –20]. It has been demonstrated many times that POMDP model performed well for generating effective dialogue strategies through reinforcement learning algorithms [21], and outperformed Markov decision process model and some handcrafted managers [22, 23]. Roy [23] first applied the POMDP model to dialogue management problem with an application of nursing home robot, and demonstrated that POMDP based dialogue manager makes fewer mistakes than MDP under the same noisy conditions. Zhang [24] extended the general POMDP model by using a factored representation of the dialogue state space, including user’s intentions and hidden system states. Williams [22, 25] further extended Zhang’s model by separating the state into three components: user’s goal, user’s action, and dialogue history. Based on this factored model, a slot-filling POMDP model was further studied [26], in which these three components were all subdivided into W slots. Moreover, a novel optimization technique called composite summary point-based value iteration was presented for enabling their model to be scaled to a realistic size. Another improved model (Hidden information state, HIS) proposed by Young [27] assumed that the user’s goal can be divided into a number of equivalence classes called partitions. Therefore the goal is replaced with partitions in the user action model, observation model and so on. In summary, all these factored models are acquired by decomposing the user’s state space in different ways.

Compared to many studies on the POMDP-based dialogue management, the researches on affective dialogue management model are relatively insufficient. Bui proposed a factored POMDP model for affective dialogue management [28, 29], in which the user’s state space was divided into four parts: goal, affective state, action, and grounding state, and the observation set was also factored into two parts: the observations of user’s action and user’s emotion. A single slot route navigation example was given to demonstrate that their model is helpful for improving the performance of dialogue management given that the user’s affect influences their behavior. However, this model did not consider how to implement an affective interaction between the user and the system. For this purpose, we think the division should not be confined to the user’s state space and the system’s observation space. The system’s action space should also be divided into several parts as needed. In this way, it can not only respond to the user’s goals, but also make an appropriate emotion response according to the user’s emotion.

In this paper, a novel factored POMDP model is proposed to describe the affective dialogue management module of affective dialogue system, which seeks to respond to both the users’ goals and emotions. Firstly, unlike previous models, the user’s state space is divided into two components: goals (intentions or needs, including the user’s action) and emotions. The system’s action space is also factored into two parts: goal response and emotion response, in order to respond to the user’s intention and emotion respectively. Secondly, the transition functions, the observation functions, and the reward functions of general POMDP model are updated and simplified with some conditional independence assumptions respectively. Moreover, an example of intelligent music player is given to illustrate the application of our new factored POMDP model in detail. The experimental results not only show the relationships among the average returns, recognition error rates, and other parameters, but also demonstrate the feasibility and validity of the proposed model in theory.

The rest of this paper is organized as follows. Section 2 provides a brief overview of the general POMDP model. Section 3 presents a novel factored model for affective dialogue management, including the user’s goal model, the user’s emotion model, and the system’s observation model, etc. In Section 4, an example of intelligent music player is given to describe the new factored POMDP model in detail. Section 5 analyzes the impact of several parameters on the system performance by four experiments. Finally, we conclude our work in Section 6.

2 Overview of POMDP model

A general POMDP model is formally defined as a 8-tuple {S, A, T, R, O, Z, b₀, γ}, where

S = {s₁, s₂, ⋯} is a set of the environment’s states (often called the state space).

A = {a₁, a₂, ⋯} is a set of the agent’s actions (often called the action space).

T : S × A × S → [0, 1] is the state transition function, where

T (s, a, s^{'}) = P (s^{'} | s, a), \sum_{s^{'} \in S} T (s, a, s^{'}) = 1,

(1) denotes the transition probability that the environ-ment will transform to the state s′ for a given state s and action a.

$R : S \times A \to ℝ$ is the reward function, where R (s, a) represents the immediate reward which the agent will receive from the environment for a given state s and action a.

O = { o₁, o₂, ⋯ } is a set of the agent’s observations (often called the observation space).

Z : A × S × O → [0, 1] is the observation function, where

Z (a, s^{'}, o^{'}) = P (o^{'} | a, s^{'}), \sum_{o^{'} \in O} Z (a, s^{'}, o^{'}) = 1

(2) defines the probability of the next observation o′ for a given action a and next state s′.

In POMDP model, the current state s of the environment is unobserved, and the probability distribution over all states is called a belief state, as follows. $b = (\begin{matrix} b (s_{1}), & b (s_{2}), & \dots & , b (s_{| S |}) \end{matrix}), \sum_{i} b (s_{i}) = 1$ (3) where b (s_i) = P {s = s_i} is the probability of the environment in state s_i. The initial belief state is always denoted by b₀.

POMDP-based dialogue management model can be depicted as an influence diagram, as shown in Fig. 2. It operates as follows. At each given time-step t, based on the current belief state b_t = b, the agent (or system) will select an action a and send it to the environment (or user), then receive an observation o_t+1 = o′ and an immediate reward

$R (b, a) = \sum_{s \in S} R (s, a) b (s) .$ (4)

Moreover, the belief state b_t = b will be updated dynamically to the next belief state b_t+1 = b′ by the Bayes theorem according to the current belief state b, action a and observation o′.

$\begin{matrix} b^{'} (s^{'}) = b_{a}^{o^{'}} (s^{'}) = P (s^{'} | b, a, o^{'}) \\ = \frac{P (o^{'} | a, s^{'}) \sum_{s \in S} P (s^{'} | s, a) b (s)}{P (o^{'} | b, a)} \\ = k \cdot \underset{Z}{\underset{︸}{P (o^{'} | a, s^{'})}} \sum_{s \in S} \underset{T}{\underset{︸}{P (s^{'} | s, a)}} b (s) \end{matrix}$ (5) where $k = \frac{1}{P (o^{'} | b, a)} = \frac{1}{\sum_{s^{'} \in S} p (o^{'} | a, s^{'}) \sum_{s \in S} p (s^{'} | s, a) b (s)}$ (6) is independent with the state s′, so it can be regarded as a normalization factor.

The goal of the agent is to select actions which fulfill its task as well as possible, i.e., to maximize the cumulative, infinite-horizon, discounted reward which is called the return $\sum_{t = 0}^{\infty} γ^{t} R (b_{t}, a_{t}) = \sum_{t = 0}^{\infty} γ^{t} \sum_{s \in S} R (s, a_{t}) b_{t} (s),$ (7) where γ (0 < γ < 1) is a discount factor. A policy $π : B \to A, π (b) \to a, a \in A$ (8) can be viewed as a function from the belief states space B to the set of actions. An optimal policy π^* is the policy which maximizes the expected return $π^{*} = \underset{π}{arg max} E (\sum_{t = 0}^{\infty} γ^{t} R (b_{t}, π (b_{t})),$ (9) it specifies for each b the optimal action to execute at the current step, assuming the agent will also act optimally at future time steps. The value of the optimal policy π^* is defined by the optimal value function V^* (b), which can be approximated by the value iteration as follows.

Specifically, a value function is defined as $V_{n} : B \to ℝ, n = 1, 2, \dots,$ (10) where V_n (b) indicates the maximum expected return $V_{n} (b) = max_{π} E (\sum_{t = 0}^{n - 1} γ^{t} R (b_{t}, π (b_{t})),$ (11) it is easy to see that $V_{1} (b) = max_{a \in A} R (b, a) = max_{a \in A} \sum_{s \in S} R (s, a) b (s),$ (12) $V_{n + 1} (b) = max_{a \in A} [R (b, a) + γ \sum_{o \in O} P (o | a, b) V_{n} (b_{a}^{o})],$ (13) where $b_{a}^{o}$ is the belief state of the environment after the agent takes an action a and receives an observation o. Based on the value iteration method, V_n (b) will converge to V^* (b) when n→ ∞, that is $V^{*} (b) = max_{a \in A} [R (b, a) + γ \sum_{o \in O} P (o | a, b) V^{*} (b_{a}^{o})]$ (14)

Moreover, it has been proven that V_n (b) is a piecewise linear and convex function (PWLC), which can be represented by a collection of hyperplanes Γ_n. The coefficients of Γ_n are usually denoted by vectors $α_{n}^{i}, i = 1, \dots, | Γ_{n} |$ , hence $V_{n} (b) = max_{{α_{n}^{i}}_{i = 1}^{| Γ_{n} |}} b \cdot α_{n}^{i} .$ (15)

Appling Equations (4), (6) and (15) into Equation (13), then

$\begin{matrix} V_{n + 1} (b) = max_{a \in A} [\sum_{s \in S} R (s, a) b (s) \\ + γ \sum_{o \in O} \sum_{s^{'} \in S} p (o | a, s^{'}) \sum_{s \in S} p (s^{'} | s, a) b (s) \\ max_{{α_{n}^{i}}_{i}} \sum_{s^{'} \in S} b_{a}^{o} (s^{'}) α_{n}^{i} (s^{'})] \\ = max_{a \in A} \sum_{o \in O} max_{{α_{n}^{i}}_{i}} \sum_{s \in S} b (s) [\frac{R (s, a)}{| O |} \\ + γ \sum_{s^{'} \in S} p (o | a, s^{'}) p (s^{'} | s, a) α_{n}^{i} (s^{'})] \end{matrix}$ (16)

Then we can update the value function by updating the $α_{n}^{i}$ -vectors as follows.

$\begin{matrix} Γ_{n + 1}^{a, o} \leftarrow α_{n + 1}^{a, o} (s) \\ = \frac{R (s, a)}{| O |} + γ \sum_{s^{'} \in S} p (o | a, s^{'}) p (s^{'} | s, a) α_{n}^{i} (s'), \\ Γ_{n_{+ 1}}^{a} = \underset{o \in O}{\oplus} Γ_{n_{+ 1}}^{a, o}, Γ_{n + 1} = \underset{a \in A}{\cup} Γ_{n_{+ 1}}^{a} \end{matrix}$ (17)

Unfortunately, about |A| · |Γ_n|^|O| vectors $α_{n + 1}^{i}$ will be generated at each stage. It means that computing optimal planning solutions for POMDP model is an intractable problem for any reasonably sized task, which calls for approximate solution techniques. Point-based value iteration (PBVI) approximate algorithm is a popular method of POMDP model. A randomized PBVI algorithm Perseus was proposed in [30], which is one of the state-of-art approximate algorithms. Moreover, some typical works on PBVI can be seen in [26 , 32].

3 New factored POMDP model

In this section, we try to construct a new factored model for affective dialogue management of dialogue system, which attempts to respond to both the users’ goals and emotions. For this purpose, the user’s state is divided into two parts: goals (intentions or needs) and emotions. The system’s action is also factored into two components: goal response and emotion response.

Compared to the general POMDP model, the new factored model is introduced as follows.

(i) User states set S = G_u × E_u, where $G_{u} = {g_{u 1}, g_{u 2}, \dots, g_{um}}, E_{u} = {e_{u 1}, e_{u 2}, \dots, e_{un}}$ (18) are the sets of user’s goals and emotions respectively. In total, there are mn user states s_u = (g_u, e_u) ∈ S.

(ii) System actions set A = A_G × A_E, where $A_{G} = {a_{g 1}, a_{g 2}, \dots, a_{gk}}, A_{E} = {a_{e 1}, a_{e 2}, \dots, a_{el}}$ (19) are the sets of actions or responses that respond to the user’s goals and emotions respectively. In total, there are kl system actions a = (a_gi, a_ej) ∈ A.

(iii) Transition functions $\begin{matrix} P (s^{'} | s, a) = P (g_{u}^{'}, e_{u}^{'} | g_{u}, e_{u}, a_{g}, a_{e}) \\ = P (g_{u}^{'} | g_{u}, e_{u}, a_{g}, a_{e}) \cdot P (e_{u}^{'} | g_{u}^{'}, g_{u}, e_{u}, a_{g}, a_{e}) . \end{matrix}$ (20)

In the first product-term, the user’s next goal $g_{u}^{'}$ can be assumed to depend only on the current goal g_u and goal response a_g, that is $P (g_{u}^{'} | g_{u}, e_{u}, a_{g}, a_{e}) = P (g_{u}^{'} | g_{u}, a_{g}) .$ (21)

This can be called the user goal model, indicating how the user’s goal changes at each time-step.

In the second product-term, the next emotion $e_{u}^{'}$ can be considered to depend on the current emotion e_u, goal response a_g and emotion response a_e, then $P (e_{u}^{'} | g_{u}^{'}, g_{u}, e_{u}, a_{g}, a_{e}) = P (e_{u}^{'} | e_{u}, a_{g}, a_{e}) .$ (22)

This can be called the user emotion model, indicating how the user’s emotion changes at each time-step.

Substituting Equations (21), (22) into Equation (20), then $P (s^{'} | s, a) = P (g_{u}^{'} | g_{u}, a_{g}) \cdot P (e_{u}^{'} | e_{u}, a_{g}, a_{e}) .$ (23) (iv) Reward functions $\begin{matrix} R (s, a) = R (g_{u}, e_{u}, a_{g}, a_{e}) \\ = w_{g} R (g_{u}, a_{g}) + w_{e} R (e_{u}, a_{e}), \end{matrix}$ (24) where R (g_u, a_g) and R (e_u, a_e) are the rewards of goal response a_g and emotion response a_e respectively. w_g and w_e denote the weight coefficients of the two rewards, w_g + w_e = 1.

(v) Observations set O = O_G × O_E, where $O_{G} = {o_{g 1}, o_{g 2}, \dots, o_{gm}}, O_{E} = {o_{e 1}, o_{e 2}, \dots, o_{en}}$ (25) are the sets of the observations of user’s goals and emotions respectively. It is generally supposed that O_G = G_u, O_E = E_u, although the number of goals and emotions that the agent can observe may be fewer than the actual.

(vi) Observation functions $\begin{matrix} P (o^{'} | a, s^{'}) = P (o_{g}^{'}, o_{e}^{'} | g_{u}^{'}, e_{u}^{'}, a_{g}, a_{e}) \\ = P (o_{g}^{'} | g_{u}^{'}, e_{u}^{'}, a_{g}, a_{e}) \cdot P (o_{e}^{'} | o_{g}^{'}, g_{u}^{'}, e_{u}^{'}, a_{g}, a_{e}) . \end{matrix}$ (26)

Suppose that the observations $o_{g}^{'}$ and $o_{e}^{'}$ depends only on the states $g_{u}^{'}$ and $e_{u}^{'}$ respectively, that is $\begin{matrix} P (o_{g}^{'} | g_{u}^{'}, e_{u}^{'}, a_{g}, a_{e}) = P (o_{g}^{'} | g_{u}^{'}), \\ P (o_{e}^{'} | o_{g}^{'}, g_{u}^{'}, e_{u}^{'}, a_{g}, a_{e}) = P (o_{e}^{'} | e_{u}^{'}) . \end{matrix}$ (27)

From Equations (26) and (27), we have $P (o^{'} | s^{'}, a) = P (o_{g}^{'} | g_{u}^{'}) \cdot P (o_{e}^{'} | e_{u}^{'}) .$ (28)

This can be called the system observation model, indicating the observation probabilities of the system on user’s goals and emotions.

Finally, based on the above definitions, the belief states at each time-step can be updated by substituting Equations (23) and (28) into Equation (5), then we have $\begin{matrix} b^{'} (s^{'}) = b^{'} (g_{u}^{'}, e_{u}^{'}) = k \cdot P (o_{g}^{'} | g_{u}^{'}) P (o_{e}^{'} | e_{u}^{'}) \cdot \\ \sum_{s_{u} \in S} P (g_{u}^{'} | g_{u}, a_{g}) P (e_{u}^{'} | e_{u}, a_{g}, a_{e}) b (g_{u}, e_{u}) \\ = k \cdot \underset{system observation model}{\underset{︸}{P (o_{g}^{'} | g_{u}^{'}) P (o_{e}^{'} | e_{u}^{'})}} \cdot \\ \sum_{g_{u} \in G_{u}} \underset{user goal model}{\underset{︸}{P (g_{u}^{'} | g_{u}, a_{g})}} \sum_{e_{u} \in E_{u}} \underset{user emotion model}{\underset{︸}{P (e_{u}^{'} | e_{u}, a_{g}, a_{e})}} b (g_{u}, e_{u}) . \end{matrix}$ (29)

Figure 3 below indicates the influence diagram depiction of our factored POMDP model for affective dialogue management. It not only clearly shows these dependencies, but also can be used to make comparisons with the general POMDP model (Fig. 2) and other factored POMDP representations.

4 Example: Intelligent music player

It is well know that music and emotion are closely related. In daily life, a lot of people would like to relax and adjust their mood by listening to different kinds of music. In this section, we take intelligent music player as an example of application to illustrate our model more specifically. It can be developed as a service of intelligent robots which can not only cope with user’s goals, but also respond to user’s emotions.

For this example, the symbols and functions in section 3 are given in sequence as follows.

(i) User states set S = G_u × E_u.

For simplicity, we briefly list five major goals of the user in a dialogue (see Table 1).

It is difficult to list all emotions that users may appear in a dialogue, but the set of users’ emotion states E_U can usually be divided into

positive set P_U (including happy, expect, ...),

neutral e_N (or calm),

negative set N_U (including sad, angry, disgust, ...).

For simplicity, we suppose that the user’s emotion include the following four types in our example:

P_U = {e_u1 = Happy},

e_N = e_u2 = Neutral,

N_U = {e_u3 = Angry, e_u4 = Disgust} .

In this condition, there are 20 states belonging to the set S = G_u × E_u. It should be noted that some of these states may not exist, because some goals and emotions are impossible to coexist. For example, it seems unlikely to show an angry emotion e_u3 under the goal g_u3, that is, the user state (g_u3, e_u3) may not exist. Moreover, an additional state “end” needs to be added into S to describe the user’s current state after the system performs the finish action.

(ii) System actions set A = A_G × A_E.

It is a complex problem to design what actions the system should take to respond to the user’s goals and emotions. For each user’s goal g_ui and emotion e_ui, we assume that the system would have an optimal response respectively. For brevity, we only collect and list these optimal responses in the actions set A_G and A_E, as shown in Tables 2 and 3 respectively.

In this condition, there are 20 actions belonging to the system actions set A = A_G × A_E, and one of them will be selected to respond to the user’s state at each time-step. Moreover, an additional action “greeting” needs to be added into A to describe the system’s initial action. Note that all the emotion responses can be expressed by an intelligent robot through multiple forms, such as facial expressions, gestures, words, and so on.

(iii) Transition functions.

In the user goal model $P (g_{u}^{'} | g_{u}, a_{g})$ , it is reason-able to assume that the user’s goal will transfer to other possible follow-on goals equiprobability if the action a_g is correct. Otherwise, the user’s goal would remain the same with p₁, specific as follows. $\begin{matrix} P (g_{u}^{'} | g_{u}, a_{g}) = \\ {\begin{matrix} \frac{1}{2} & if g_{u} = g_{u 1}, a_{g} = a_{g 1}, and g_{u}^{'} \in {g_{u 3}, g_{u 4}}, \\ p_{1} & if g_{u} = g_{u 1}, a_{g} \neq a_{g 1}, and g_{u}^{'} = g_{u 1}, \\ \frac{1 - p_{1}}{2} & if g_{u} = g_{u 1}, a_{g} \neq a_{g 1}, and g_{u}^{'} \in {g_{u 2}, g_{u 5}}, \\ \frac{1}{2} & if g_{u} = g_{u 2}, a_{g} = a_{g 2}, and g_{u}^{'} \in {g_{u 3}, g_{u 4}}, \\ p_{1} & if g_{u} = g_{u 2}, a_{g} \neq a_{g 2}, and g_{u}^{'} = g_{u 2}, \\ \frac{1 - p_{1}}{2} & if g_{u} = g_{u 2}, a_{g} \neq a_{g 2}, and g_{u}^{'} \in {g_{u 1}, g_{u 5}}, \\ \frac{1}{3} & if g_{u} = g_{u 3}, a_{g} = a_{g 3}, and g_{u}^{'} \in {g_{u 1}, g_{u 2}, g_{u 5}}, \\ p_{1} & if g_{u} = g_{u 3}, a_{g} \neq a_{g 3}, and g_{u}^{'} = g_{u 3}, \\ \frac{1 - p_{1}}{3} & if g_{u} = g_{u 3}, a_{g} \neq a_{g 3}, and g_{u}^{'} \in {g_{u 1}, g_{u 2}, g_{u 5}}, \\ \frac{1}{2} & if g_{u} = g_{u 4}, a_{g} = a_{g 4}, and g_{u}^{'} \in {g_{u 1}, g_{u 2}}, \\ p_{1} & if g_{u} = g_{u 4}, a_{g} \neq a_{g 4}, and g_{u}^{'} = g_{u 4}, \\ \frac{1 - p_{1}}{2} & if g_{u} = g_{u 4}, a_{g} \neq a_{g 4}, and g_{u}^{'} \in {g_{u 2}, g_{u 5}}, \\ 1 & if g_{u} = g_{u 5}, a_{g} = a_{g 5}, g_{u}^{'} = end \\ 1 & if g_{u} = g_{u 5}, a_{g} \neq a_{g 5}, g_{u}^{'} = g_{u 5}, \\ 0 & otherwise \end{matrix} \end{matrix}$ (30)

In the user emotion model $P (e_{u}^{'} | e_{u}, a_{g}, a_{e})$ , we suppose that the transfer probability varies depending on the user’s current emotion state e_u. Specifically, there are three situations:

If the current emotion is positive, the next emotion would be positive if either response a_g or a_e is correct, and would be neutral if a_g and a_e are both wrong.

If the current emotion is neutral, both correct or both wrong responses will cause a next positive or negative emotion respectively.

If the current emotion is negative, both correct responses will lead to a neutral state, otherwise remain negative.

Moreover, we assume that a correct observation will result in a correct action, and denote the average recognition error rates of the user’s goals and emotions as p_og and p_oe respectively. That is, p_og and p_oe also represent the average error rate of the system’s actions a_g and a_e respectively. Based on these analyses, the user emotion model is defined as follows, in which |P_U| = 1, |N_U| = 2. $\begin{matrix} P (e_{u}^{'} | e_{u}, a_{g}, a_{e}) = \\ {\begin{matrix} p_{og} p_{oe} & if e_{u} \in P_{U}, and e_{u}^{'} = e_{N}, \\ \frac{1 - p_{og} p_{oe}}{| P_{U} |} & if e_{u} \in P_{U}, and e_{u}^{'} \in P_{U}, \\ 0 & if e_{u} \in P_{U}, and e_{u}^{'} \in N_{U}, \\ \frac{(1 - p_{og}) (1 - p_{oe})}{| P_{U} |} & if e_{u} = e_{N}, and e_{u}^{'} \in P_{U}, \\ (1 - p_{og}) p_{oe} + (1 - p_{oe}) p_{og} & if e_{u} = e_{N}, and e_{u}^{'} = e_{N}, \\ \frac{p_{og} p_{oe}}{| N_{U} |} & if e_{u} = e_{N}, and e_{u}^{'} \in N_{U}, \\ (1 - p_{og}) (1 - p_{oe}) & if e_{u} \in N_{U}, and e_{u}^{'} = e_{N}, \\ \frac{p_{ag} + p_{ae} - p_{ag} p_{ae}}{| N_{U} |} & if e_{u} \in N_{U}, and e_{u}^{'} \in N_{U}, \\ 0 & if e_{u} \in N_{U}, and e_{u}^{'} \in P_{U} . \end{matrix} \end{matrix}$ (31) (iv) Reward functions.

Generally, for each goal g_ui and emotion e_ui, we suppose that a positive reward value +1 should be received for the optimal action and a negative reward -1 for other actions. In particular, if the user’s goal is to finish the dialogue (g_u5), the correct action a_g5 will get a larger reward 2, and the reward of other error actions are all equal to -2. R (g_u, a_g) and R (e_u, a_e) in Equation (24) are defined as follows. $R (g_{ui}, a_{gj}) = {\begin{matrix} 1, & i = j, i, j \in {1, 2, 3, 4} \\ - 1, & i \neq j, i, j \in {1, 2, 3, 4} \\ 2, & i = 5, j = 5 \\ - 2, & i = 5, j \neq 5 \end{matrix}$ (32) $R (e_{ui}, a_{ej}) = {\begin{matrix} 1, & i = j, i, j \in {1, 2, 3, 4} \\ - 1, & i \neq j, i, j \in {1, 2, 3, 4} \end{matrix}$ (33) (v) Observations set O = O_G × O_E.

As mentioned earlier in Equation (25), O_G and O_E are the two sets of observations of the users’ goals and emotions respectively, and O_G = G_U, O_E = E_U.

(vi) Observation functions. $P (o_{g}^{'} | g_{u}^{'}) = P (o_{g} | g_{u}) = {\begin{matrix} 1 - p_{og} & if o_{g} = g_{u}, \\ \frac{p_{og}}{| G_{u} | - 1} & if o_{g} \neq g_{u}, \end{matrix}$ (34) $P (o_{e}^{'} | e_{u}^{'}) = P (o_{e} | e_{u}) = {\begin{matrix} 1 - p_{oe} & if o_{e} = e_{u}, \\ \frac{p_{oe}}{| E_{u} | - 1} & if o_{e} \neq e_{u}, \end{matrix}$ (35) where |G_u| = 5 and |E_u| = 4 denote the number of the two sets respectively. p_og and p_oe are the average recognition error rates, which have been mentioned in Equation (31).

Finally, a dialogue example is given in the Table 4 to show the affective dialogue process between a user and the player.

5 Experimental results and discussions

In the initial state, the user’s goal should be either g_u1 or g_u2, and the user’s emotion could be any one of the four emotions. That is, the initial state of the user should be one of the states $(g_{ui}, e_{uj}), i = 1, 2; j = 1, 2, 3, 4 .$ (36)

So the initial belief state is defined as $b_{0} = (1 / - 8, 1 / - 8, \dots, 1 / - 8, 0, 0, \dots, 0)$ (37) in all experiments. The weight coefficients w_g and w_e of the reward function are both set to 0.5 in the first three experiments, which represents that the two kinds of responses are equally important for users in dialogue.

The following six figures show the relationships among the average returns, discount factors, goal and emotion recognition error rates, transition probability, and weight coefficients. The experimental results are acquired by the point-based value iteration algorithm Perseus.

The first experiment discusses the impact of discount factor on the average return values. Figure 4 shows that the return values will increase with the growth of the discount factor γ, especially when γ > 0.9. In fact, the discount factor is usually set to bigger than 0.9 in other POMDP experiments. The discount factor is set to 0.95 in all the following experiments.

The second experiment further demonstrates the relationship between the average return and the state transition probability p₁ of the user’s goal model. From Fig. 5 we can obtain that a smaller p₁ (< 0.4) will get a higher return. But overall, the probability p₁ has less influence on the return values (only range from 5.5 to 4.9), which means the system performance relies less on whether the user changes their goals in dialogue. It also illustrates that the user goal model is valid to some extent.

The third experiment is designed to reveal the relationship of the average returns and emotion recognition error rate, as well as that of the average returns and goal recognition error rate. The results are shown in Figs. 6 and 7. It is reasonable to see that a higher error rate will receive a smaller return in both figures. Moreover, you can also find that the average returns will fall much more in Figs. 6 than 7. This reflects the emotion recognition error rate p_oe has more impact on the system performance. In other words, a good performance depends mainly on a lower emotion recognition error rate.

The final experiment further analyzes the influence of the two recognition error rates on the system performance under three different weight coefficients. Figure 8 shows the average returns all decrease with the increasing p_oe for three groups of different weight coefficients. By contrast, under the same emotion recognition rate p_oe (< 0.7), you will find that the larger the weight w_e, the larger the average returns. It’s easy to understand that when w_e is relatively larger, the reward function in Equation (24) would depend more on the second part w_eR (e_u, a_e). Meanwhile, a smaller p_oe will gain a lager reward R (e_u, a_e) based on the previous assumption that a correct observation will result in a correct action.

Similarly, Fig. 9 indicates that the average returns all decrease with the increasing goal recognition rate p_og for the same three groups of weight coefficients. It is reasonable to find that the returns decrease faster when w_g is larger. Compared with the Fig. 8, the decline scope is relatively small. It reflects the goal recognition error rate p_og has lesser impact on the system performance, which is consistent with the third experiment.

In a practical application, we can select different weight coefficients and adjust the reward function according to the requirement of the designed system. For example, w_g should be bigger than w_e in a dialogue system which is mainly devoted to information inquiry and supplemented by affective interaction.

6 Conclusion and future work

In this paper, a novel factored POMDP model was proposed to design the affective dialogue management module, which aims to respond to both the user’s intention and emotion in a dialogue. Different from the previous factored POMDP models, we not only separated the user’s state space, but also divided the system’s action space. The transition function, the observation function, and the reward function are all updated respectively. An example of intelligent music player was given to concretely illustrate our new factored model. It should be noted that this is just an application example of the factored model, which can also be applied in other similar tasks. Moreover, we evaluated the influence of key parameters on the system performance by the return function, and the experiment results showed that the new factored model is reasonable and feasible.

Our future work mainly includes two parts. The first part is to improve and evaluate our factored POMDP model by further separating the user’s state space into more components, including user’s action and dialogue history. The second part is to develop a useful affective dialogue system with intelligent player and other similar functions for our robot based on the improved model.

Footnotes

Acknowledgments

This research has been partially supported by National Natural Science Foundation of China under Grant No. 61432004 and No. 61472117, JSPS KAKENHI Grant No. 15H01712. Natural Science Fund of Education Department of Anhui province under Grant No. KJ2015B1105916

References

Walker

M.A.

, Rudnicky

A.I.

, Aberdeen

J.S.

, et al., DARPA communicator evaluation: Progress from 2000 to 2001, Interspeech (2002), 273–276.

Mariani

and Lamel

, An overview of EU programs related to conversational/interactive systems, Proceedings of DARPA Broadcast News Transcription and Understanding Workshop (1998), pp. 247–253.

Lopatovska

and Arapakis

, Theories, methods and current research on emotions in library and information science, information retrieval and human–computer interaction, Information Processing and Management47 (2011), 575–592.

Picard

R.W.

, Affective Computing, The MIT Press, 1997.

André

, Dybkjær

, Minker

and Heisterkamp

, Affective Dialogue Systems: Tutorial and Research Workshop, Lecture Notes in Artificial Intelligence2004, Springer.

Skowron

, Theunis

and Rank

, et al., Affect and social processes in online communication-experiments with an affective dialog system, IEEE Transactions on Affective Computing (2013), 1–14.

Ben Ammar

, Neji

, Alimi

A.M.

and Gouardères

, The affective tutoring system, Expert Systems with Applications37 (2010), 3013–3023.

Calvo

R.A.

and D’

, Mello, Affect detection: An inter-disciplinary review of models, methods, and their applications, IEEE Transactions on Affective Computing1 (2010), 18–37.

Ren

F.J.

, Affective information processing and recognizing human emotion, Electronic Notes in Theoretical Computer Science25 (2009), 39–50.

10.

Ren

F.J.

and Kang

, Employing hierarchical Bayesian networks in simple and complex emotion topic analysis, Computer Speech & Language27 (2013), 943–968.

11.

Ren

F.J.

and Wu

, Predicting user-topic opinions in twitter with social and topical context, IEEE Transactions on Affective Computing4 (2013), 412–424.

12.

Kulic

and Croft

, Affective state estimation for human robot interaction, IEEE Transactions on Robotics23 (2007), 991–1000.

13.

Pittermann

, Pittermann

and Minker

, Emotion recognition and adaptation in spoken dialogue systems, Int J Speech Technol13 (2010), 49–60.

14.

López-Cózar

, Silovsky

and Kroul

, Enhancement of emotion detection in spoken dialogue systems by combining several information sources, Speech Communication53 (2011), 1210–1228.

15.

Han

M.J.

, Lin

C.H.

and Song

K.T.

, Robotic emotional expression generation based on mood transition and personality model, IEEE Transactions on Cybernetics43 (2013), 1290–1303.

16.

Lee

, Jung

, Kim

, et al., Recent approaches to dialog management for spoken dialog systems, Journal of Computing Science and Engineering4 (2010), 1–22.

17.

Young

, Using POMDPs for dialog management, Proc of IEEE Workshop on Spoken Language Technology (SLT)2006, pp. 8–13.

18.

Young

, Gasic

, Thomson

, et al., POMDP-based statistical spoken dialog systems: A review, Proceedings of the IEEE101(5) (2013), 1160–1179.

19.

Crook

P.A.

, Keizer

, Wang

, et al., Real user evaluation of a POMDP spoken dialogue system using automatic belief compression, Computer Speech & Language28 (2014), 873–887.

20.

Kim

, Kim

J.H.

and Kim

K.E.

, Robust performance evaluation of POMDP-based dialogue systems, IEEE Transactions on Audio, Speech, and Language Processing19 (2011), 1029–1040.

21.

Jurčíček

, Thomson

and Young

, Reinforcement learning for parameter estimation in statistical spoken dialogue systems, Computer Speech & Language26 (2012), 168–192.

22.

Williams

J.D.

and Young

, Partially observable Markov decision processes for spoken dialogue systems, Computer Speech and Language21 (2007), 393–422.

23.

Roy

, Pineau

and Thrun

, Spoken dialogue management using probabilistic reasoning, pp, Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL) (2000), 93–100.

24.

Zhang

, Cai

, Mao

and Guo

, Spoken dialog management as planning and acting under uncertainty, Proceedings of the 7th European Conference on Speech Communication and Technology (EUROSPEECH), (2001), pp. 2169–2121.

25.

Williams

J.D.

, Poupart

and Young

, Factored partially observable Markov decision processes for dialogue management, Proceedings of the 4th Workshop on Knowledge and Reasoning in Practical Dialog Systems, (2005), pp. 76–82.

26.

Williams

J.D.

and Young

, Scaling POMDPs for spoken dialog management, IEEE Transactions on Audio, Speech, and Language Processing15(7) (2007), 2116–2129.

27.

Young

, Gasić

, Keizer

, Mairesse

, Schatzmann

, et al., The hidden information state model: A practical framework for pomdp-based spoken dialogue management, Computer Speech and Language24(2) (2010), 150–174.

28.

Bui

T.H.

, Poel

, Nijholt

, et al., A tractable DDN-POMDP approach to affective dialogue modeling for general probabilistic frame-based dialogue systems, Natural Language Engineering15(2) (2007), 273–307.

29.

Bui

T.H.

, Zwiers

, Poel

and Nijholt

, Affective dialogue management using factored POMDPS, Interactive Collaborative Information Systems (2010), 207–236.

30.

Spaan

M.T.J.

and Vlassis

N.A.

, Perseus: Randomized point based value iteration for POMDPs, Journal of Artificial Intelligence Research24 (2005), 195–220.

31.

Hauskrecht

, Value-function approximations for partially observable markov decision processes, Journal of Artificial Intelligence Research13 (2000), 33–94.

32.

Zhang

N.L.

and Zhang

, Speeding up the convergence of value iteration in partially observable Markov decision processes, Journal of Artificial Intelligence Research14 (2001), 29–51.