Multi-source information fusion model in rule-based Gaussian-shaped fuzzy control inference system incorporating Gaussian density function

Abstract

An increasing number of applications require the integration of data from various disciplines, which leads to problems with the fusion of multi-source information. In this paper, a special information structure formalized in terms of three indices (the central presentation, population or scale, and density function) is proposed. Single and mixed Gaussian models are used for single source information and their fusion results, and a parameter estimation method is also introduced. Furthermore, fuzzy similarity computing is developed for solving the fuzzy implications under a Mamdani model and a Gaussian-shaped density function. Finally, an improved rule-based Gaussian-shaped fuzzy control inference system is proposed in combination with a nonlinear conjugate gradient and a Takagi-Sugeno (T-S) model, which demonstrated the effectiveness of the proposed method as compared to other fuzzy inference systems.

Keywords

Gaussian density function IF-THEN rule multi-source information fusion similarity computing fuzzy control inference system

1 Introduction

The human brain obtains information from different sources; it then merges this information to form concepts and finally outputs natural language (NL), which is powerful and versatile enough to describe the real world. NL can be regarded as the fusion of disparate information; it is vague, ambiguous, and uncertain. The quantitative calculation and qualitative analysis of NL is the ultimate goal of artificial intelligence. There are two strands of research linking the initial information acquisition with NL: (1) how to simplify the presentation of NL and (2) how to form NL from multi-source information. Usually, humans express emotions of certain objects by using sentences and affective words, but they cannot fully express their intuitive perception of an object simply through separating these terms. Natural Language Processing (NLP) was developed to solve this problem; however, many difficulties remain in this field. Computing with Words (CW) was also introduced to decrease the complexity related to linguistic variables [16 –18]. This has allowed for a more exact expression of the meaning of what a human is thinking about and has provided a feasible direction for NLP under weakened conditions. Zadeh introduced the framework of this phenomenon of uncertainty using Fuzzy Sets (FS) in 2005 [19]. The FS theory was also addressed to describe objects at a coarse-grained level. Herrera and Martínez [5] introduced a 2-tuple fuzzy linguistic representation model for CW without any loss of information. Furthermore, Lawry [13, 14] proposed Label Semantics (LS) for vague concept modeling and reasoning techniques so as to formalize uncertainty in presentation theory. Subsequently, Lawry and Tang [12, 34, 35 , 12, 34, 35] proposed a new semantic understanding model: the Prototype Theory (PT). These works discovered the connection between fuzzy presentation technology and high-level semantics. In engineering fields, linguistic representation models combined with affective words have had some applications, such as fuzzy decision making [21, 31] and KANSEI Engineering (KE). Fuzzy inference methodologies have also been shown to be effective in our previous work on Rough Sets [7] and Fuzzy Support Vector Machines (SVMs) [6].

However, it has been regarded as more feasible to focus on multi-source information fusion rather than on NL itself. Moreover, it is important to discover the mechanics of integrating multi-source information in the human brain. Due to the modular and vague appearance of multi-source information, uncertainty reasoning methods and their associated mathematical tools are thought to offer more interpretability and a much stronger generalization capability [24]. Yager developed the theoretical foundation for multi-source information fusion techniques based on set measure and possibility theories [25, 26]. Normally, single-source information consists of steady features that are more easily formalized and parameterized. In previous studies, the sum, product, max/min, and Weighted Arithmetic Mean (WAM) were used to combine single-source information, and each output represented an independent source of information that could be treated separately [15].

Relative to mathematical research and understanding the phenomenon of uncertainty, the integration of information using fuzzy inference techniques pervades many scientific disciplines, such as multivariate and type-2 fuzzy sets; bipolar models [10, 11]; and probability and possibility issues [9, 27]. Information fusion is the merging of information from disparate sources with differing conceptual, contextual, and typographical representations. It has been successfully applied in data mining and the consolidation of data from unstructured or semi-structured resources, and it has also led to many achievements in various fields [1 , 8]. Fusion methods include product fusion (such as the Bayes posterior probability model), linear fusion (SVM classifiers), and nonlinear fusion (super-kernel integration) [23]. Recent developments and applications of fuzzy information fusion can be found in pattern classification, image analysis, decision-making, man-made structures, and medicine [30, 32]. Furthermore, over the past several years, there has been a number of successful applications of fuzzy integrals in decision-making and pattern recognition that have employed multiple information sources [3, 20].

In this paper, we formalize multi-source information as a multivariable group and describe each information structure as a special kind of triple, I = < P, d, ρ >, where P denotes a typical point of positive examples relative to the information structure I, d is a distance measurement that represents the population of information, and ρ is a Probability Density Function (PDF). The basic idea of this formalized information structure is to assume that the neighborhood radius of each information structure is uncertain, which is limited by PDF-ρ. Thus, we will calculate the value of P relative to an information structure under a given level. An information fusion technique was developed by formalizing this special information structure; furthermore, information fusion employing fuzzy sets was applied in this paper. A Single Gaussian Model (SGM) was applied to single-source information, and a Gaussian Mixed Model (GMM) was applied to the fusion of this information by incorporating probabilistic and statistical methods [28, 36].

The remainder of this paper proceeds as follows. In Section 2, we propose an information structure that incorporates a definition of the information kernel, boundary, and Gaussian PDF. An improved algorithm for parameter estimation is also introduced. Section 3 introduces fuzzy similarity relations and IF-THEN rules for this special information structure. These are helpful for calculating the possibilities in a rule-based fuzzy inference system (FIS). Section 4 develops a rule-based information fusion model using a conjugate gradient and Takagi–Sugeno (T-S) model under a rule-based Gaussian-shaped fuzzy inference system (RGS-FIS). A time-series analysis using natural disaster datasets is also introduced using RGS-FIS, and we demonstrate the effectiveness of our method in comparison to other methodologies. Finally, in Section 5, we give our conclusions and ideas for future work.

2 Information fusion models by using probability density function

2.1 Definitions

Definitions for our information structure and kernel computing method were established as follows.

Definition 1. Assume object Ω is described by the multi-source information set I = {I_k|k = 1, 2, ⋯ , m} and that measure set V = {v_k|k = 1, 2, ⋯ , m} is a set of information structures corresponding to set I. For ∀v_k ∈ V, we define v_k =< P_k, d_k, ρ_k >, where P_k is a typical point as the kernel of I_k. Moreover, d_k is a metric of the information structure v_k related to the population or scale of information and will be used for boundary computing. Lastly, ρ_k is a density function on the threshold of v_k.

Definition 2. Let the fusion operator be ⊕ so that Ω can be formalized as: $I_{1} + I_{2} + \dots + I_{m} = v_{1} \oplus v_{2} \oplus \dots \oplus v_{m}$ (1) where ⊕ is a minimum operator.

Definition 3. ∀P_k, Q_k in an n-dimensional Euclidean space Rⁿ, P_k = [P_k1, P_k2, ⋯ , P_kn], and Q_k = [Q_k1, Q_k2, ⋯ Q_kn]. Moreover, let d =∥ ∥, and it has the following properties:

$d (P_{k}, P_{k}) = ∥ P_{k} ∥ = \sqrt{(\sum_{i} P_{ki})}$

d (P_k± Q_k) = ∥ P_k ± Q_k ∥, ∀P_k, Q_k ∈ Rⁿ

∀α, β ∈ R, P_k, Q_k ∈ Rⁿ

We have d (αP_k± βQ_k) = ∥ αP_k ± βQ_k ∥; in addition, $d (α P_{k} + β Q_{k}) \leq | α | d (P_{k}) + | β | d (Q_{k}) .$

Definition 4. For sample points ${P_{k}^{l} | l = 1, 2, \dots}$ , the statistics-based kernel point computation is calculated as: $P_{k} = \sum_{l} P_{k}^{l} = [\sum_{l} P_{k 1}^{l}, \sum_{l} P_{k 2}^{l}, \dots \sum_{l} P_{kn}^{l}]$ (2) $P_{ki}^{l}$ indicates the value of the i-th dimension of the l-th sample point in the k-th information source.

The boundary of v_k gives the scale of the neighborhood of all elements in this special information structure. This is defined below.

Definition 5. For ∀P_k ∈ Rⁿ, there exists a neighborhood, $N_{P_{k}}^{ɛ} = {X | ∥ P_{k} - X ∥ < ɛ, X \in R^{n}}$ (3)

Definition 6. For calculating the boundary of I, two sets were defined as:

- The Upper Approximation Boundary (UAB) ${UP}_{B} = {P_{l} | P_{l} \in N_{P_{K}}^{u}}$ (4)

- The Lower Approximation Boundary (LAB) ${LP}_{B} = {P_{l} | P_{l} \in N_{P_{K}}^{t}}$ (5)

Therefore, the boundary is P_B = UP_B ∖ LP_B = B (u, t). Thus, we have P_B = P_K + λ (P_B - P_K), λ ∈ [0, 1], which exhibits fuzziness attributes at the boundary.

2.2 Probability density function

–Single Gaussian Model for single-source information

The Gaussian distribution is a continuous probability distribution with a bell-shaped PDF in one-dimensional space: $f (x, μ, σ^{2}) = \frac{1}{\sqrt{2 π} σ} e^{- \frac{1}{2} (\frac{x - μ}{σ})^{2}}$ (6)

The parameter μ is the mean or expectation, and σ² is the variance. The SGM is applied to induct the density function of the proposed information structure I, and we define: $δ (X, μ, Φ) = \frac{1}{\sqrt{(2 π)^{n} | Φ |}} e^{- \frac{1}{2} (X - μ)^{T} Φ^{- 1} (X - μ)}$ (7) where X is a vector in n-dimensional space, Φ is the covariance matrix, and μ is the mean value of the density function. The density function’s properties are determined by (Φ, μ), so this is a parameter estimation problem [29]. For any point P_i ∈ Rⁿ, its probability density function is δ (P_i, μ, Φ), and if, for any information structure v_k, each P_i in v_k is regarded as an independent event, then the PDF of v_k is: $δ_{k} = δ (v_{k}, μ, Φ) = \prod_{i}^{m} δ (P_{i}, μ, Φ)$ (8)

The maximum likelihood estimation can be used to estimate the parameters (Φ, μ) under (8). Taking the logarithm of (8), we have: $\begin{matrix} O (μ, Φ) & = ln (\prod_{i}^{m} δ (P_{i}, μ, Φ)) \\ = \sum_{i}^{m} ln (δ (P_{i}, μ, Φ)) \\ = \sum_{i}^{m} - \frac{n}{2} ln (2 π) - \frac{1}{2} ln | Φ | \end{matrix}$ (9) $\begin{matrix} + \frac{1}{2} (P_{i} - μ)^{T} Φ^{- 1} (P_{i} - μ)] \\ = - \frac{nm}{2} ln (2 π) - \frac{m}{2} ln | Φ | \\ - \frac{m}{2} \sum_{i} [P_{i} - μ)^{T} Φ^{- 1} (P_{i} - μ)] \end{matrix}$

Taking the partial derivative w.r.t. μ of O (μ, Φ) and setting it to 0, we obtain the following: $\begin{matrix} \partial_{μ} (O (μ, Φ)) = - \frac{1}{2} \sum_{i}^{m} [- 2 Φ^{- 1} (P_{i} - μ) \\ = Δ^{- 1} \sum_{i}^{m} [(P_{i} - μ)] \\ = Δ^{- 1} [\sum_{i}^{m} P_{i} - m μ] \\ = 0 \end{matrix}$ (10)

This gives $\hat{μ} = \frac{1}{2} \sum_{i} P_{i}$ . Similarly, for Φ, we can obtain $\hat{Φ} = \frac{1}{n - 1} \sum_{i} (P_{i} - \hat{μ}) (P_{i} - \hat{μ})^{T}$ . Thus, if the density of each point in v_k is $δ (P, \hat{μ}, \hat{Φ})$ , then our estimation of the parameter μ is: $\hat{μ} = (\frac{1}{n} \sum_{i} e_{1 i}, \frac{1}{n} \sum_{i} e_{2 i}, \dots \frac{1}{n} \sum_{i} e_{ni})$ (11) where e_li is the coordinate of P_i in Rⁿ.

The covariance $\hat{Φ}$ is converted to $\begin{matrix} \hat{Φ} = \frac{1}{n - 1} \sum_{i} [e_{1 i} - {\hat{μ}}_{1}, e_{2 i} - {\hat{μ}}_{2}, \dots, e_{ni} - {\hat{μ}}_{n}] \\ [\begin{matrix} e_{1 i} - {\hat{μ}}_{1} \\ e_{2 i} - {\hat{μ}}_{2} \\ \dots \\ e_{ni} - {\hat{μ}}_{n} \end{matrix}] \\ = \frac{1}{n - 1} \sum_{j = 1}^{n} \sum_{i = 1}^{n} (e_{ji} - {\hat{μ}}_{j})^{2} \end{matrix}$ (12)

- Gaussian Mixed Model and parameter estimation

For multi-source information fusion, we need to calculate all of I_k’s density functions as well as calculate the new density function. For m multi-source information structures, let $I_{fusion} = \sum_{i = 1}^{l} α_{i} δ (P, μ_{i}, Φ_{i})$ for a normalized weight parameter α: i.e., ∑_iα_i = 1. To calculate and simplify the covariance matrix Φ, let $Φ = [\begin{matrix} σ^{2} & 0 & \dots & 0 \\ 0 & σ^{2} & \dots & 0 \\ 0 & \dots & \dots & 0 \\ 0 & 0 & \dots & σ^{2} \end{matrix}] = σ^{2} \vec{I}$ (13)

From the SGM, we have that $δ (P, μ, σ^{2} \vec{I}) = \frac{1}{\sqrt{(2 π)^{n}}} σ^{- 1} e^{- \frac{(P - μ)^{T} (P - μ)}{2 σ^{2}}}$ (14)

Calculate: $\begin{matrix} \partial_{μ} [δ (P, μ, σ^{2} \vec{I})] = \frac{1}{\sqrt{(2 π)^{n}}} \partial_{μ} (σ^{- 1} e^{- \frac{(P - μ)^{T} (P - μ)}{2 σ^{2}}}) \\ = \frac{1}{\sqrt{(2 π)^{n}}} σ^{- 1} e^{- \frac{(P - μ)^{T} (P - μ)}{2 σ^{2}}} \partial_{μ} (- \frac{(P - μ)^{T} (P - μ)}{2 σ^{2}})) \\ = δ (P, μ, σ^{2} \vec{I}) (\frac{P - μ}{σ^{2}}) \end{matrix}$

and $\begin{matrix} \partial_{Δ} [δ (P, μ, σ^{2} \vec{I})) = \frac{1}{\sqrt{(2 π)^{n}}} ((- 1) σ^{- 2} e^{- \frac{(P - μ)^{T} (P - μ)}{2 σ^{2}}}) \\ + \frac{1}{\sqrt{(2 π)^{n}}} σ^{- 1} e^{- \frac{(P - μ)^{T} (P - μ)}{2 σ^{2}}} [\frac{(P - μ)^{T} (P - μ)}{σ^{3}}] \\ = δ (P; μ, σ^{2} \vec{I}) (\frac{(P - μ)^{T} (P - μ)}{σ^{3}} - \frac{1}{σ^{2}}) \end{matrix}$

Then, for $Δ = c \vec{I}$ , c ∈ R, GMM is defined as G (P) = ∑_iα_iδ (P, μ_i, σ_i), i = 1, 2, ⋯ , lf. The number of parameters for estimation is 3l. If we let $θ = [α_{1}, α_{2}, \dots, α_{l}, μ_{1}, μ_{2}, \dots μ_{l}, σ_{1}^{2}, σ_{2}^{2}, \dots, σ_{l}^{2}]$ , the object is that:

$\begin{matrix} L (θ) & = ln [\prod_{i} G (P_{i})] = \sum_{i} ln ({GP}_{i})) \\ = \sum_{i} ln (\sum_{j = 1}^{l} α_{1} δ (P_{i}, μ_{j}, σ_{j}^{2})) \end{matrix}$ (15) which can be differentiated w.r.t. μ_j and σ_j. Thus, we have that: $\partial_{μ_{j}} (L (θ)) = \sum_{i} \frac{α_{j} δ (P_{i}, μ_{j}, σ_{j}^{2})}{\sum_{j = 1}^{l} α_{i} δ (P_{i}, μ_{j}, σ_{j}^{2})} \frac{P_{i} - μ_{j}}{σ_{j}^{2}}$ (16)

Let $φ_{j} (P_{i}) = \frac{α_{j} δ (P_{i}, μ_{j}, σ_{j}^{2})}{\sum_{j = 1}^{l} α_{j} δ (P_{i}, μ_{j}, σ_{j}^{2})}$ , so that: $\partial_{μ_{j}} (L (θ)) = \sum_{i} ϕ_{j} (P_{i}) (\frac{P_{i} - μ_{j}}{σ_{j}^{2}})$ (17)

Similarly, we can find: $\begin{array}{l} \partial_{σ_{j}} (L (θ)) = \sum_{i} \frac{α_{j} δ (P_{i}; μ_{j}, σ_{j}^{2})}{\sum_{j = 1}^{l} α_{i} δ (P_{i}; μ_{j}, σ_{j}^{2})} \\ [\frac{{(P_{i} - μ_{j})}^{T} {(P_{i} - μ_{j})}^{T}}{σ_{j}^{3}} - \frac{1}{σ_{j}^{2}}] \\ = \sum_{i} ϕ_{j} (P_{i}) [\frac{{(P_{i} - μ_{j})}^{T} {(P_{i} - μ_{j})}^{T}}{σ_{j}^{3}} - \frac{1}{σ_{j}^{2}}] \end{array}$ (18)

Setting the above two equations equal to 0, we have ${\hat{μ}}_{j} = \frac{\sum_{i} φ_{j} (P_{i}) P_{i}}{\sum_{i} φ_{j} (P_{i})}$ (19) ${\hat{σ}}^{2} = \frac{1}{3} \frac{\sum_{i} φ_{j} (P_{i}) (P_{i} - μ_{j})^{T} (P_{i} - μ_{j})}{\sum_{i} φ_{j} (P_{i})}$ (20)

For α_j, under the constraint ∑_jα_j = 1, we use Lagrange multipliers to re-define the object as: $\begin{matrix} J = L (θ) + λ (1 - \sum_{i = 1} α_{i}) \\ = \sum_{i} ln (\sum_{j} α_{j} δ (P_{i}, μ_{j}, σ_{j}^{2})) + λ (1 - \sum_{i = 1} α_{i}) \end{matrix}$ (21)

Differentiating this new object w.r.t. α_j, we have that:

$\begin{matrix} \partial_{α_{j}} J & = \sum_{i} \frac{δ (P_{i}, μ_{j}, σ_{j}^{2})}{\sum_{j = 1}^{l} α_{j} δ (P_{i}, μ_{j}, σ_{j}^{2})} - λ \\ = \frac{1}{α_{j}} \sum_{i} φ_{j} (P_{i}) - λ = 0 \end{matrix}$ (22) $[{\hat{α}}_{1}, {\hat{α}}_{2}, \dots {\hat{α}}_{l}] = [\frac{1}{λ} \sum_{i} φ_{1} (P_{i})], \frac{1}{λ} \sum_{i} φ_{2} (P_{i}), \dots \frac{1}{λ} \sum_{i} φ_{k} (P_{i})]$ (23) ${\hat{α}}_{1} + {\hat{α}}_{2} + \dots + {\hat{α}}_{l} = \frac{1}{λ} (\sum_{i} (φ_{1} (P_{i}) + φ_{2} (P_{i}) + \dots + φ_{k} (P_{i})) = 1$ (24) Furthermore, we know λ = l, so: $[{\hat{α}}_{1}, {\hat{α}}_{2}, \dots {\hat{α}}_{l}] = [\frac{1}{lf} \sum_{i} φ_{1} (P_{i})], \frac{1}{l} \sum_{i} φ_{2} (P_{i}), \dots, \frac{1}{l} \sum_{i} φ_{k} (P_{i})]$ (25) where φ is also a function of parameters, and we can resolve this using the following iteration:

Step 1: Let

$θ = [α_{1}, α_{2}, \dots α_{l}, μ_{1}, μ_{2}, \dots μ_{l}, σ_{1}^{2}, σ_{2}^{2}, \dots σ_{l}^{2}]$

Given an initial value and in order to achieve convergence, μ₁, μ₂, ⋯ μ_m may be calculated by the cluster method.

Step 2: Calculate φ_j (P_i).

Step 3: Calculate ${\tilde{μ}}_{j} = \frac{\sum_{i} φ_{j} (P_{i}) P_{i}}{\sum_{i} φ_{j} (P_{i})}$ .

Step 4: Calculate $σ_{j} = \frac{1}{lf} \frac{\sum_{i} φ_{j} (P_{i}) (P_{i} - {\tilde{μ}}_{j})^{T} (P_{i} - {\tilde{μ}}_{j})}{\sum_{i} φ_{j} (P_{i})} .$

Step 5: Calculate $α_{j} = \frac{1}{lf} \sum_{i} φ_{j} (P_{i})$ .

Step 6: Let

$\hat{θ} = [{\hat{α}}_{1}, {\hat{α}}_{2}, \dots {\hat{α}}_{l}, {\hat{μ}}_{1}, {\hat{μ}}_{2}, \dots {\hat{μ}}_{l}, {\hat{σ}}_{1}^{2}, {\hat{σ}}_{2}^{2}, \dots {\hat{σ}}_{l}^{2}]$ If $∥ θ - \hat{θ} ∥ < δ$ for a given threshold δ, then stop the process; otherwise, proceed to Step 2.

In actuality, the density function of information fusion under this special structure is a product of the fusion of SGMs. For all information structures v_k and their SGM densities δ (v_k), $δ (I_{fusion}) = \prod_{k} δ (v_{k})$ ; therefore, we have that: $\begin{matrix} \prod_{k} δ (v_{k}) = \prod_{k} \frac{1}{\sqrt{(2 π)^{n}} σ_{k}} e^{- \frac{1}{2 σ_{k}^{2}} (P - μ)^{T} (P - μ)} \\ = \frac{1}{\sqrt{(2 π)^{nk}} \prod_{k} σ_{k}} e^{\sum_{k} - \frac{1}{2 σ_{k}^{2}} (P - μ)^{T} (P - μ)} \\ = \frac{1}{\sqrt{(2 π)^{nk}} \prod_{k} σ_{k}} e^{α P^{2} + β P + γ} \\ = \frac{1}{\sqrt{(2 π)^{nk}} \cdot C \cdot \prod_{k} σ_{k}} e^{- \frac{1}{2 σ_{k}^{2}} (P^{'} - μ^{'})^{T} (P^{'} - μ^{'})} \\ = C^{'} δ \end{matrix}$ (26)

where C is an undetermined constant.

In particular, in a one-dimensional space with σ = 1, we have that: $\begin{matrix} δ (I_{fusion}) = \frac{1}{\sqrt{(2 π)^{nk}} \prod_{k} σ_{k}} e^{α x^{2} + β x + γ} \\ = \frac{1}{\sqrt{(2 π)^{nk}}} e^{α x^{2} + β x + γ} \\ = \frac{1}{\sqrt{(2 π)^{nk}}} e^{x'^{2} + β^{'} x^{'} + γ^{'}} \\ = C \frac{1}{\sqrt{(2 π)^{n}}} e^{(x^{'} - μ)^{2}} \end{matrix}$ (27)

This is a linear transformation of the basic Gaussian function. Thus, for any two information structures v_i, v_j, the fusion result is v_ij =< αP_ij, βd_ij, γδ_ij > where α, β, and γ are undetermined coefficients.

3 Fuzzy implications of information structure under IF-THEN rules

3.1 Fuzzy implications of information structures under IF-THEN rules

In fuzzy sets, the rule “IF x is $\bar{A}$ , THEN y is $\bar{B}$ ” indicates a fuzzy implication between $\bar{A}$ and $\bar{B}$ as denoted by $\bar{A} \to \bar{B}$ . If we let x, y ∈ [0, 1] be the memberships of $\bar{A}$ and $\bar{B}$ , respectively, we list the Mamdani model for the membership computing as t ∀x, y ∈ [0, 1], F (x, y) = Min {x, y}.

We construct a fuzzy membership based on a new fuzzy implication and inference system. We also derive a similarity relationship and apply this to the Gaussian density function-based fuzzy rule inference system. For δ_k in a rule-based IF-THEN inference system, suppose that the rule set is:

IF I₁ is v₁, then I_o is v_o, ω₁, I₂ is v₂, then I_o is v_o, ω₂.

We can integrate these rules as:

IF I₁ is v₁ and I₂ is v₂, THEN I_o is v_o, $ω_{ij} = δ (I_{fusion}) = δ (v_{ij})$ (28)

The Mamdani model for δ_k in two-dimensional space (x, y) will be:

$Min (\frac{1}{\sqrt{2 π} σ_{11} σ_{12}} e^{- \frac{1}{2} [\frac{(x - μ_{11})^{2} + (y - μ_{12})^{2}}{σ_{11}^{2} + σ_{12}^{2}}]}, \frac{1}{\sqrt{2 π} σ_{21} σ_{22}} e^{- \frac{1}{2} [\frac{(x - μ_{21})^{2} + (y - μ_{22})^{2}}{σ_{21}^{2} + σ_{22}^{2}}]})$ (29)

In particular, if σ₁₁ = σ₁₂ = σ₂₁ = σ₂₁ = 1 and μ₁₁ = μ₁₂ = μ₂₁ = μ₂₂ = 0, we have that: $M (x, y) = \frac{1}{\sqrt{2 π}} e^{- [\frac{x^{2} + y^{2}}{4}]}$ (30)

Thus, for any other implication operators, the function of rules will have the form: $M (x, y) = \frac{1}{\sqrt{2 π}} e^{- \frac{1}{2} ({Ax}^{2} + {By}^{2} + Cx + Dy + E)} .$ (31)

4 Applications

4.1 Mamdani model-based fuzzy control inference system using nonlinear conjugate gradient

In the previous section, information was formalized as v_k =< P_k, d_k, ρ_k > where P_k is a central point in Rⁿ, d_k ∈ R, and ρ_k is a Gaussian density function. P_k and d_k operate as fuzzy numbers using the fuzzy logical operation in Section 2. In our fuzzy rule-based inference system if different information (multi-source information) implies the same conclusion, then this information is integrated. Supposing that the multisource information structure v_k will conclude with a particular assertion at the ρ_k level, we have that: $IF I_{1} is v_{1} and I_{2} is v_{2} and \dots, and I_{m} is v_{m} THEN I_{o} is v_{o}, ϖ$ (32)

This can be simplified to, $IF v_{1} and v_{2} and, \dots, and v_{m} THEN ϖ (v_{1}, v_{2}, \dots, v_{m})$

and $IF P_{1} and P_{2} and, \dots, and P_{m} THEN P_{1} \land P_{2} \land \dots \land P_{m}$ $IF d_{1} and d_{2} and, \dots, and d_{m} THEN d_{1} \land d_{2} \land \dots \land d_{m}$ $IF ρ_{1} and ρ_{2} and, \dots and ρ_{m} THEN ϖ (ρ_{1}, ρ_{2}, \dots, ρ_{m}) .$

Now, we only discuss the density function δ_k(here ρ_k) in our FIS. Letting Θ = [δ₁, δ₂, ⋯ , δ_n], we have that: $IF Θ THEN ϖ (Θ) .$ (33)

From the previous section, we know that ϖ (Δ) is a Gaussian density function, so the rule is re-labeled as IF X THEN f (X). For this rule set, we have $R_{i} : IF X THEN f_{i} (X)$

However, as f (X) is a nonlinear function, it is difficult to find its minimum point under the Mamdani model, so we need to linearize f (X) and use the nonlinear conjugate gradient algorithm to optimize the parameters of f (X).

If we suppose that f (X) = [Ax - b] ^T [Ax - b], then the gradient is ∇_xf (x) =2A^T (Ax - b), and the objective is to find x subject to ∇_xf (x) =0. The nonlinear conjugate gradient requires f being twice differentiable, but as f is a Gaussian function, it is infinitely differentiable. Starting from the opposite direction as Δx₀ = - ∇ _xf (x₀) with step size α, we have that: $α_{0} = arg min_{α} f (x_{0} + α Δ x_{0})$ (34) $x_{1} = x_{0} + α_{0} Δ x_{0}$ (35)

This is the first iteration in the direction of Δx₀, and by setting the initial conjugate direction s₀ = Δx₀, the following steps will calculate Δx_n:

Step 1: Calculate Δx_n = - ∇ _xf (x_n).

Step 2: Calculate $β_{n} : β_{n} = \frac{Δ x_{n}^{T} (Δ x_{n} - Δ x_{n - 1})}{Δ x_{n - 1}^{T} Δ x_{n - 1}}$ . (Polak–Ribière)

Step 3: Update the conjugate direction s_n = Δx_n + β_ns_n-1.

Step 4: Calculate $α_{n} = arg min_{α} f (x_{n} + α s_{n})$ .

Step 5: Update x_n+1 = x_n + α_ns_n.

The algorithm is based on the quadratic function that we use to normalize the Gaussian function f (x) in order to speed up the iterations. Considering a simplified Mamdani model and from formula (20), we know that: $M (x, y) = \frac{1}{\sqrt{2 π}} e^{- \frac{1}{2} ({Ax}^{2} + {By}^{2} + Cx + Dy + E)}$ (36)

Using the nonlinear conjugate gradient, we obtain the results given in Fig. 1 and Table 1 by comparing with other special functions. From Table 1, we know that the Gaussian density function will be approximated in just a few steps by the nonlinear conjugate gradient algorithm, which is the reason we selected the Gaussian distribution as the density function of this special structure. We also compare other forms of density function, which appear to require more steps under the nonlinear conjugate gradient algorithm.

4.2 Takagi–Sugeno model in RGS-FIS

Takagi and Sugeno [18] proposed a fuzzy IF-THEN rules system as the local input–output relations of a nonlinear system to scale the population of rules under a multi-dimensional fuzzy inference system, known as the T-S model [21]. The normal rules for the T-S model under the special information structure proposed for our information fusion method are: $\begin{matrix} R_{T - S} : IF INPUT - is I_{1}, INPUT - is I_{2}, \dots, \\ INPUT - n n is I_{n} THEN I_{f} = f (I_{1}, I_{2}, \dots I_{m}) . \end{matrix}$

The T-S model outputs a linear, non-constant function that will reduce the population of rules.

From rule set R_T-S, we can simplify $I_{f} = \sum_{i = 1}^{n} a_{i} δ_{i} + b_{i} d_{i}$ , in which a_i and b_i are undetermined constants. Let the standard deviation in Equation (6) $σ = \vec{I}$ , μ = 0, and thus, $I_{f} = \frac{1}{\sqrt{2 π}} \sum_{i = 1}^{n} a_{i} e_{i}^{- \frac{x^{2}}{2}} + b_{i} x$ . The first part of I_f is a GMM model that can be estimated by Section 2.2-(2), and the second part of I_f is a linear function (see Fig. 2).

Furthermore, for the nonlinear conjugate gradient proposed in Section 4.1, we obtain 100 steps and 301 gradients to find the minimum point (the Mamdani model). As a result, we can simplify this in RGS-FIS under the T-S model to output three linear membership functions. Suppose that the inputs are Gaussian-shaped rules, and the outputs are linear functions. Let the membership function of INPUT 1 and INPUT 2 be a Gaussian function, and the OUTPUT is composed of three linear functions [33]. We have this RGS-FIS system under the T-S model (see Fig. 3).

5 Concluding remarks and future works

This paper proposed a novel information structure applicable to a Gaussian-shaped FIS. We developed the RGS-FIS approach using the nonlinear conjugate gradient algorithm and a T-S model. However, there are two problems with RGS-FIS: one is that new fusion operator parameters depend on a complex estimation process, and the other is that all data variables are supposed to be independent (r = 0). The model selection for similarity computing under rule-based fuzzy implication operations should also be improved.

Future work will focus on the pre-processing of datasets as well as the estimation of model parameters. Pre-processing will tune the parameters of the model to display a simpler mathematical presentation and assure a robust inference process. Furthermore, the fusion operator needs to be improved so that it does not solely depend on fuzzy implications. Although similarity computing is the key factor for calculating the possibility of IF-THEN rules, it is not clear whether a feasible algorithm can be developed for this. Hence, the possibility of the IF-THEN rules also needs to be calculated and improved.

Footnotes

Acknowledgments

The authors would like to thank the editors, the anonymous reviewers, and Dr. Thayer El-Dajjani for their most constructive comments and suggestions to improve the quality of this paper. This work is supported by Zhejiang Provincial Natural Science Fund under No. (LY13H180012)

References

Taleb-Ahmed

Gautier

2002

On information fusion to improve segmentation of MRI sequences

Information Fusion 3 103 117

Luo

Chen

Fang

2015

Gaussian successive fuzzy integral for sequential multi-decision making

International Journal of Fuzzy Systems 2 17 321 336

Kutsenko

Sinyuk

2015

Inference methods for systems with many fuzzy inputs

Journal of Computer and Systems Sciences International 54 3 375 383

Zervas

Mpimpoudis

Anagnostopoulos

Sekkas

Hadjiefthymiades

2011

Multisensor data fusion for fire detection

Information Fusion 12 150 159

Herrera

Martínez

2000

A 2-tuple fuzzy linguistic representation model for computing with words

IEEE Trans Fuzzy Systems 8 6 746 752

Shi

2012

Emotional cellular-based multi-class fuzzy support vector machines on product’s KANSEI extraction

Appl Math Inf Sci 6 1 41 49

Shi

Sun

2012

Employing rough sets and association rule mining in KANSEI knowledge extraction

Inform Sci 196 118 128

Pavlin

de Oude

Maris

Nunnink

Hood

2010

A multi-agent systems approach to distributed Bayesian information fusion

Information Fusion 11 267 282

Couso

Sánchez

2011

Upper and lower probabilities induced by a fuzzy random variable

Fuzzy Sets and Systems 165 1 23

10.

Aisbett

Rickard

Morgenthaler

2011

Multivariate modeling and type-2 fuzzy sets

Fuzzy Sets and Systems 163 78 95

11.

Lawry

Tang

2012

On truth-gaps, bipolar belief and the assertability of vague propositions

Artificial Intelligence 191-192 20 41

12.

Lawry

Tang

2009

Uncertainty modelling for vague concepts: A prototype theory approach

Artificial Intelligence 173 1539 1558

13.

Lawry

1994

Inexact reasoning

66 80

University of Manchester

PhD Thesis

14.

Lawry

2005

Modelling and Reasoning with Vague Concepts, Studies in Computational Intelligence

3 20

Berlin

Springer-Verlag

15.

Kuncheva

2003

Fuzzy’ vs ‘Non-fuzzy’ in combining classifiers designed by boosting

IEEE Trans on Fuzzy Systems 11 6 729 741

16.

Zadeh

1975

The concept of a linguistic variable and its application to approximate reasoning

Inform Sci 8 199 249

17.

Zadeh

1975

The concept of a linguistic variable and its application to approximate reasoning-I

Inform Sci 8 249 299

18.

Zadeh

1975

The concept of a linguistic variable and its application to approximate reasoning-II

Inform Sci 8 301 357

19.

Zadeh

2005

Toward a generalized theory of uncertainty (GTU)-an outline

Inform Sci 172 1 40

20.

Sugeno

1974 Theory of fuzzy integrals and its applications Tokyo Institute of Technology PhD thesis

21.

Sozhamadevi

Sathiyamoorthy

2015

A probabilistic fuzzy inference system for modeling and control of nonlinear process

Arabian Journal for Science and Engineering 40 6 1777 1791

22.

McCauley Bush

Wang

1997

Fuzzy linear regression models for assessing risks of cumulative trauma disorders

Fuzzy Set and Systems 92 317 340

23.

Yager

Liu

2008 Classic Works of the Dempster-Shafer Theory of Belief Functions Dempster

Shafer

Heidelberg, Germany

Springer-Verlag

24.

Yager

2012

Conditional approach to possibility-probability fusion

IEEE Trans Fuzzy Systems 20 1 46 56

25.

Yager

2012

Entailment principle for measure-based uncertainty

IEEE Trans Fuzzy Systems 20 3 526 535

26.

Yager

2011

Set measure directed multi-source information fusion

IEEE Trans Fuzzy Systems 19 6 1031 1039

27.

Destercke

Dubois

Chojnacki

2009

Possibilistic information fusion using maximal coherent subsets

IEEE Trans Fuzzy Systems 17 1 79 92

28.

Nefti-Meziani

Oussalah

Soufian

2015

On the use of inclusion structure in fuzzy clustering algorithm in case of Gaussian membership functions

Journal of Intelligent and Fuzzy Systems 28 4 1477 1493

29.

Denoeux

2011

Maximum likelihood estimation from fuzzy data using the EM algorithm

Fuzzy Sets and Systems 183 72 91

30.

Pham

2011

Fuzzy posterior-probabilistic fusion

Pattern Recognition 44 1023 1030

31.

Ahram

McCauley-Bush

Karwowski

2010

Estimating intrinsic dimensionality using the multi-criteria decision weighted model and the average standard estimator

Inform Sci 180 2845 2855

32.

Dou

Ruan

Chen

Bloyet

Constans

J-M

2007

A framework of fuzzy information fusion for the segmentation of brain tumor tissues on MR images

Image and Vision Computing 25 164 171

33.

Tung

Quek

2009

A Mamdani-Takagi-Sugeno based linguistic neural-fuzzy inference system for improved interpretability-accuracy representation

Proc IEEE-FUZZY 367 372

Korea

34.

Tang

Lawry

2009

Linguistic modelling and information coarsening based on prototype theory and label semantics

Int J Approx Reasoning 50 1177 1198

35.

Tang

2010

A prototype based rule inference system incorporating linear functions

Fuzzy Sets and System 161 2831 2853

36.

Zhang

2014

Gaussian mixture reduction based on fuzzy ART for extended target tracking

Signal Processing 97 232 241