Abstract
Partially observable Markov decision process (POMDP) model has been demonstrated many times to be suited for robust spoken dialogue management. Recently, some factored representations of POMDP model are designed for specific dialogue tasks. This paper proposes a novel factored POMDP model to describe a new application of affective dialogue management. Different from existing models, the user’s state space and the system’s observation space are both divided into two distinct components: goal and emotion. Moreover, the system’s action space is for the first time factored into two parts, i.e., goal response and emotion response, and the reward function is accordingly updated by weighted sum of the two-part rewards. An example of intelligent music player is given to explain how to apply the new model to build an affective dialogue system. Four experiments are designed to reveal the influence of key parameters on the system performance. The simulation results demonstrate the rationality and feasibility of the proposed model.
Introduction
Spoken dialogue systems have been designed and developed for human computer interaction in 1990s [1, 2], which attempt to provide a convenient inquiry service for users through a dialogue. In the last decade, how to make the machine capable of emotional interaction with human has become a new and challenging subject in the human computer interaction field [3]. A new concept of affective computing was firstly proposed by Picard [4], which aims to enable the computer to perceive, recognize and understand human emotions, and to make an intelligent, sensitive, friendly response to human emotions. Under this background, affective dialogue system, which can perceive user’s emotion, and then make an appropriate response accordingly, has also been under consideration [5, 6], in order to respond to the requirement of many application areas, such as nursing home robots and intelligent tutoring system [7].
To date, most researches of affective computing mainly focus on detecting and recognizing user’s emotional state with various modalities [8], such as speech, facial expressions, posture, text, etc. [9–12]. Correspondingly, many researches related to affective dialogue system are also concentrated on improving accuracy of speech emotion recognition in spoken dialogue system [13, 14], and generating the emotional expression of robots [15]. However, these works, which are necessary for emotional interaction, are not enough to build a useful speech based affective dialogue system, as shown in Fig. 1. As is known to all, dialogue management module plays a key role in the control policy, which contributes directly to the system performance [16]. A new kind of affective dialogue management mechanism is indispensable to connect the emotional recognition and emotional expression module.
Partially observable Markov decision process (POMDP) models have recently drawn significant interest in theoretical study of dialogue management due to the robust performance [17–20]. It has been demonstrated many times that POMDP model performed well for generating effective dialogue strategies through reinforcement learning algorithms [21], and outperformed Markov decision process model and some handcrafted managers [22, 23]. Roy [23] first applied the POMDP model to dialogue management problem with an application of nursing home robot, and demonstrated that POMDP based dialogue manager makes fewer mistakes than MDP under the same noisy conditions. Zhang [24] extended the general POMDP model by using a factored representation of the dialogue state space, including user’s intentions and hidden system states. Williams [22, 25] further extended Zhang’s model by separating the state into three components: user’s goal, user’s action, and dialogue history. Based on this factored model, a slot-filling POMDP model was further studied [26], in which these three components were all subdivided into W slots. Moreover, a novel optimization technique called composite summary point-based value iteration was presented for enabling their model to be scaled to a realistic size. Another improved model (Hidden information state, HIS) proposed by Young [27] assumed that the user’s goal can be divided into a number of equivalence classes called partitions. Therefore the goal is replaced with partitions in the user action model, observation model and so on. In summary, all these factored models are acquired by decomposing the user’s state space in different ways.
Compared to many studies on the POMDP-based dialogue management, the researches on affective dialogue management model are relatively insufficient. Bui proposed a factored POMDP model for affective dialogue management [28, 29], in which the user’s state space was divided into four parts: goal, affective state, action, and grounding state, and the observation set was also factored into two parts: the observations of user’s action and user’s emotion. A single slot route navigation example was given to demonstrate that their model is helpful for improving the performance of dialogue management given that the user’s affect influences their behavior. However, this model did not consider how to implement an affective interaction between the user and the system. For this purpose, we think the division should not be confined to the user’s state space and the system’s observation space. The system’s action space should also be divided into several parts as needed. In this way, it can not only respond to the user’s goals, but also make an appropriate emotion response according to the user’s emotion.
In this paper, a novel factored POMDP model is proposed to describe the affective dialogue management module of affective dialogue system, which seeks to respond to both the users’ goals and emotions. Firstly, unlike previous models, the user’s state space is divided into two components: goals (intentions or needs, including the user’s action) and emotions. The system’s action space is also factored into two parts: goal response and emotion response, in order to respond to the user’s intention and emotion respectively. Secondly, the transition functions, the observation functions, and the reward functions of general POMDP model are updated and simplified with some conditional independence assumptions respectively. Moreover, an example of intelligent music player is given to illustrate the application of our new factored POMDP model in detail. The experimental results not only show the relationships among the average returns, recognition error rates, and other parameters, but also demonstrate the feasibility and validity of the proposed model in theory.
The rest of this paper is organized as follows. Section 2 provides a brief overview of the general POMDP model. Section 3 presents a novel factored model for affective dialogue management, including the user’s goal model, the user’s emotion model, and the system’s observation model, etc. In Section 4, an example of intelligent music player is given to describe the new factored POMDP model in detail. Section 5 analyzes the impact of several parameters on the system performance by four experiments. Finally, we conclude our work in Section 6.
Overview of POMDP model
A general POMDP model is formally defined as a 8-tuple {S, A, T, R, O, Z, S = {s1, s2, ⋯} is a set of the environment’s states (often called the state space). A = {a1, a2, ⋯} is a set of the agent’s actions (often called the action space). T : S × A × S → [0, 1] is the state transition function, where is the reward function, where R (s, a) represents the immediate reward which the agent will receive from the environment for a given state s and action a. O = { o1, o2, ⋯ } is a set of the agent’s observations (often called the observation space). Z : A × S × O → [0, 1] is the observation function, where
In POMDP model, the current state s of the environment is unobserved, and the probability distribution over all states is called a belief state, as follows.
POMDP-based dialogue management model can be depicted as an influence diagram, as shown in Fig. 2. It operates as follows. At each given time-step t, based on the current belief state
Moreover, the belief state
The goal of the agent is to select actions which fulfill its task as well as possible, i.e., to maximize the cumulative, infinite-horizon, discounted reward which is called the return
Specifically, a value function is defined as
Moreover, it has been proven that V
n
(
Appling Equations (4), (6) and (15) into Equation (13), then
Then we can update the value function by updating the -vectors as follows.
Unfortunately, about |A| · |Γ n ||O| vectors will be generated at each stage. It means that computing optimal planning solutions for POMDP model is an intractable problem for any reasonably sized task, which calls for approximate solution techniques. Point-based value iteration (PBVI) approximate algorithm is a popular method of POMDP model. A randomized PBVI algorithm Perseus was proposed in [30], which is one of the state-of-art approximate algorithms. Moreover, some typical works on PBVI can be seen in [26, 32].
In this section, we try to construct a new factored model for affective dialogue management of dialogue system, which attempts to respond to both the users’ goals and emotions. For this purpose, the user’s state is divided into two parts: goals (intentions or needs) and emotions. The system’s action is also factored into two components: goal response and emotion response.
Compared to the general POMDP model, the new factored model is introduced as follows.
(i) User states set S = G
u
× E
u
, where
(ii) System actions set A = A
G
× A
E
, where
(iii) Transition functions
In the first product-term, the user’s next goal can be assumed to depend only on the current goal g
u
and goal response a
g
, that is
This can be called the user goal model, indicating how the user’s goal changes at each time-step.
In the second product-term, the next emotion can be considered to depend on the current emotion e
u
, goal response a
g
and emotion response a
e
, then
This can be called the user emotion model, indicating how the user’s emotion changes at each time-step.
Substituting Equations (21), (22) into Equation (20), then
(v) Observations set O = O
G
× O
E
, where
(vi) Observation functions
Suppose that the observations and depends only on the states and respectively, that is
From Equations (26) and (27), we have
This can be called the system observation model, indicating the observation probabilities of the system on user’s goals and emotions.
Finally, based on the above definitions, the belief states at each time-step can be updated by substituting Equations (23) and (28) into Equation (5), then we have
Figure 3 below indicates the influence diagram depiction of our factored POMDP model for affective dialogue management. It not only clearly shows these dependencies, but also can be used to make comparisons with the general POMDP model (Fig. 2) and other factored POMDP representations.
It is well know that music and emotion are closely related. In daily life, a lot of people would like to relax and adjust their mood by listening to different kinds of music. In this section, we take intelligent music player as an example of application to illustrate our model more specifically. It can be developed as a service of intelligent robots which can not only cope with user’s goals, but also respond to user’s emotions.
For this example, the symbols and functions in section 3 are given in sequence as follows.
(i) User states set S = G u × E u .
For simplicity, we briefly list five major goals of the user in a dialogue (see Table 1).
It is difficult to list all emotions that users may appear in a dialogue, but the set of users’ emotion states E
U
can usually be divided into positive set P
U
(including happy, expect, ...), neutral e
N
(or calm), negative set N
U
(including sad, angry, disgust, ...).
For simplicity, we suppose that the user’s emotion include the following four types in our example: P
U
= {eu1 = Happy}, e
N
= eu2 = Neutral, N
U
= {eu3 = Angry, eu4 = Disgust} .
In this condition, there are 20 states belonging to the set S = G u × E u . It should be noted that some of these states may not exist, because some goals and emotions are impossible to coexist. For example, it seems unlikely to show an angry emotion eu3 under the goal gu3, that is, the user state (gu3, eu3) may not exist. Moreover, an additional state “end” needs to be added into S to describe the user’s current state after the system performs the finish action.
(ii) System actions set A = A G × A E .
It is a complex problem to design what actions the system should take to respond to the user’s goals and emotions. For each user’s goal g ui and emotion e ui , we assume that the system would have an optimal response respectively. For brevity, we only collect and list these optimal responses in the actions set A G and A E , as shown in Tables 2 and 3 respectively.
In this condition, there are 20 actions belonging to the system actions set A = A G × A E , and one of them will be selected to respond to the user’s state at each time-step. Moreover, an additional action “greeting” needs to be added into A to describe the system’s initial action. Note that all the emotion responses can be expressed by an intelligent robot through multiple forms, such as facial expressions, gestures, words, and so on.
(iii) Transition functions.
In the user goal model, it is reason-able to assume that the user’s goal will transfer to other possible follow-on goals equiprobability if the action a
g
is correct. Otherwise, the user’s goal would remain the same with p1, specific as follows.
In the user emotion model, we suppose that the transfer probability varies depending on the user’s current emotion state e
u
. Specifically, there are three situations: If the current emotion is positive, the next emotion would be positive if either response a
g
or a
e
is correct, and would be neutral if a
g
and a
e
are both wrong. If the current emotion is neutral, both correct or both wrong responses will cause a next positive or negative emotion respectively. If the current emotion is negative, both correct responses will lead to a neutral state, otherwise remain negative.
Moreover, we assume that a correct observation will result in a correct action, and denote the average recognition error rates of the user’s goals and emotions as p
og
and p
oe
respectively. That is, p
og
and p
oe
also represent the average error rate of the system’s actions a
g
and a
e
respectively. Based on these analyses, the user emotion model is defined as follows, in which |P
U
| = 1, |N
U
| = 2.
Generally, for each goal g
ui
and emotion e
ui
, we suppose that a positive reward value +1 should be received for the optimal action and a negative reward -1 for other actions. In particular, if the user’s goal is to finish the dialogue (gu5), the correct action ag5 will get a larger reward 2, and the reward of other error actions are all equal to -2. R (g
u
, a
g
) and R (e
u
, a
e
) in Equation (24) are defined as follows.
As mentioned earlier in Equation (25), O G and O E are the two sets of observations of the users’ goals and emotions respectively, and O G = G U , O E = E U .
(vi) Observation functions.
Finally, a dialogue example is given in the Table 4 to show the affective dialogue process between a user and the player.
In the initial state, the user’s goal should be either gu1 or gu2, and the user’s emotion could be any one of the four emotions. That is, the initial state of the user should be one of the states
So the initial belief state is defined as
The following six figures show the relationships among the average returns, discount factors, goal and emotion recognition error rates, transition probability, and weight coefficients. The experimental results are acquired by the point-based value iteration algorithm Perseus.
The first experiment discusses the impact of discount factor on the average return values. Figure 4 shows that the return values will increase with the growth of the discount factor γ, especially when γ > 0.9. In fact, the discount factor is usually set to bigger than 0.9 in other POMDP experiments. The discount factor is set to 0.95 in all the following experiments.
The second experiment further demonstrates the relationship between the average return and the state transition probability p1 of the user’s goal model. From Fig. 5 we can obtain that a smaller p1 (< 0.4) will get a higher return. But overall, the probability p1 has less influence on the return values (only range from 5.5 to 4.9), which means the system performance relies less on whether the user changes their goals in dialogue. It also illustrates that the user goal model is valid to some extent.
The third experiment is designed to reveal the relationship of the average returns and emotion recognition error rate, as well as that of the average returns and goal recognition error rate. The results are shown in Figs. 6 and 7. It is reasonable to see that a higher error rate will receive a smaller return in both figures. Moreover, you can also find that the average returns will fall much more in Figs. 6 than 7. This reflects the emotion recognition error rate p oe has more impact on the system performance. In other words, a good performance depends mainly on a lower emotion recognition error rate.
The final experiment further analyzes the influence of the two recognition error rates on the system performance under three different weight coefficients. Figure 8 shows the average returns all decrease with the increasing p oe for three groups of different weight coefficients. By contrast, under the same emotion recognition rate p oe (< 0.7), you will find that the larger the weight w e , the larger the average returns. It’s easy to understand that when w e is relatively larger, the reward function in Equation (24) would depend more on the second part w e R (e u , a e ). Meanwhile, a smaller p oe will gain a lager reward R (e u , a e ) based on the previous assumption that a correct observation will result in a correct action.
Similarly, Fig. 9 indicates that the average returns all decrease with the increasing goal recognition rate p og for the same three groups of weight coefficients. It is reasonable to find that the returns decrease faster when w g is larger. Compared with the Fig. 8, the decline scope is relatively small. It reflects the goal recognition error rate p og has lesser impact on the system performance, which is consistent with the third experiment.
In a practical application, we can select different weight coefficients and adjust the reward function according to the requirement of the designed system. For example, w g should be bigger than w e in a dialogue system which is mainly devoted to information inquiry and supplemented by affective interaction.
In this paper, a novel factored POMDP model was proposed to design the affective dialogue management module, which aims to respond to both the user’s intention and emotion in a dialogue. Different from the previous factored POMDP models, we not only separated the user’s state space, but also divided the system’s action space. The transition function, the observation function, and the reward function are all updated respectively. An example of intelligent music player was given to concretely illustrate our new factored model. It should be noted that this is just an application example of the factored model, which can also be applied in other similar tasks. Moreover, we evaluated the influence of key parameters on the system performance by the return function, and the experiment results showed that the new factored model is reasonable and feasible.
Our future work mainly includes two parts. The first part is to improve and evaluate our factored POMDP model by further separating the user’s state space into more components, including user’s action and dialogue history. The second part is to develop a useful affective dialogue system with intelligent player and other similar functions for our robot based on the improved model.
Footnotes
Acknowledgments
This research has been partially supported by National Natural Science Foundation of China under Grant No. 61432004 and No. 61472117, JSPS KAKENHI Grant No. 15H01712. Natural Science Fund of Education Department of Anhui province under Grant No. KJ2015B1105916
