Improving the dendritic lattice neural network by utilizing a fuzzy inclusion measure

Abstract

It has been proven that the dendritic lattice neural network (DLNN) has the advantages of fast calculation, nonexistent convergence problems, and a superior capacity to store information. However, several datasets have also shown that the DLNN still suffers from low classification accuracy problems. This paper proposes that the main reason behind this problem is that the original DLNN cannot classify the samples that fall outside of all the hyperboxes. In order to solve this problem, a fuzzy inclusion measure is introduced to improve DLNN model’s testing algorithm. The improved testing algorithm of the DLNN model consists of two parts: (1) the classification of samples covered by a hyperbox with the DLNN model, and (2) the classification of samples outside all of the hyperboxes based on the principle of maximum membership degree. Throughout this study, four standard datasets were employed to evaluate the effectiveness of the improved DLNN (based on comparisons with the original DLNN). Experimental results show that, in both the training and testing samples, the improved DLNN is capable of higher classification accuracies than the original DLNN.

Keywords

Dendritic lattice neural network fuzzy inclusion measure hyperbox maximum membership degree

1 Introduction

The dendritic lattice neural network [9] (DLNN) was first proposed by introducing the model of signal neuron computation to artificial neural networks (ANNs). This combination has been shown to compute a perfect approximation to any data distribution [5, 9]. The DLNN and its related models [2 , 10–12] have been adopted quickly and used widely in nonlinear problems of double helix [9] and N-bit parity [7], disease detection [4, 5], pattern recognition [6, 13], and image processing [8]. The DLNN and its models have been used due to the following advantages: efficient training, fast calculation, easy hardware implementation, nonexistent convergence problems, and superior information storage capacity (among others).

In our previous studies, we conducted a series of experiments on classical datasets to evaluate the DLNN model. However, the experimental results showed that the DLNN still suffered from low classification accuracy problems. Our current study demonstrates that the DLNN cannot accurately classify the testing samples that fall outside of the hyperboxes.

In view of the shortage of the DLNN, we improved the DLNN model by utilizing a fuzzy inclusion measure to increase the DLNN’s classification accuracy. When the sample is confirmed to be outside of all the hyperboxes, we compute the fuzzy inclusion measures of the first hyperbox of all classes, and we then assign the class label of the hyperbox with the maximum fuzzy inclusion measure to the testing sample. The original DLNN is still used to identify the samples covered by hyperboxes. The experimental results on four standard datasets indicate that the improved DLNN (IDLNN) has the capacity to outperform the original DLNN on both training and testing datasets.

The rest of this paper is structured as follows: Section 2 offers a brief introduction on the basic theories of the DLNN model; Section 3 describes the improvement in the DLNN via the use of a fuzzy inclusion measure; Section 4 shows our experimental results on four standard datasets; and Section 5 summarizes the study’s conclusions.

2 DLNN: Basic theories

There are input neurons N₁, N₂, …, N_n and out-put neurons M (with a dendritic structure) in the DLNN model. Neuron N_i (i = 1, 2, …, n) sends input information x_i (i = 1, 2, …, n) through its synaptic branches to the dendritic trees of output neurons. The symbol $ω_{ik}^{l}$ denotes the connection weight between one synapse branch of N_i (i = 1, 2, …, n) and the kth dendrite of output neurons M. The superscript l ∈ {0, 1} distinguishes whether the synapse branch causes excitation (l = 1) or inhibition (l = 0) on the dendrite. The computation formula of the kth dendrite is given by $τ_{k} (x) = p_{k} \underset{i \in I (k)}{\land} \underset{l \in L (i)}{\land} (- 1)^{(1 - l)} (x_{i} + ω_{ik}^{l})$ (1) where x = (x₁, x₂, ⋯ , x_n) ′, I (k)≠ ∅ and L (i)≠ ∅, I (k) ∈ {1, ⋯ , n} correspond to the set of all input neurons N_i that connect with the kth dendrite of M. Meanwhile, L (i) ∈ {0, 1} corresponds to the set of the number of synapses on the kth dendrite connected with ith neuron N_i. In the present study, each neuron had at most two axonal branches terminating on a given dendrite of M [9]. p_k ∈ {1, - 1} denotes whether the kth dendrite causes excitation (p_k = 1) or inhibition (p_k = -1) on the input received.

For all dendrites D = {D₁, D₂, ⋯ , D_K}, where K denotes the total number of dendrites, the total input received by neurons M is given by $τ (x) = \land_{k = 1}^{K} τ_{k} (x)$ (2)

The activation function used in signal layer morphological perceptron (SLMP) with dendrites is the hard limiter: $f (τ) = {\begin{matrix} 1, & if & τ \geq 0 \\ 0 & if & τ < 0 \end{matrix}$ (3)

The total state computation of neurons M is given by $y (x) = f (\land_{k = 1}^{K} [p_{k} \underset{i \in I (k)}{\land} \underset{l \in L (i)}{\land} (- 1)^{(1 - l)} (x_{i} + ω_{ik}^{l})])$ (4)

In the DLNN model, the lattice structure is not determined in advance. Rather, morphological neurons generate new dendrites that connect with the synapses of input neurons based on need during the training process. The training algorithms of DLNN are given below.

Training algorithms of DLNN
Training set T ={ (x^ξ, c_ξ ∈ {0, 1}) : ξ = 1, 2, ⋯ , m }, $x^{ξ} = (x_{1}^{ξ}, x_{2}^{ξ}, \dots, x_{n}^{ξ})$ , C₁ = {ξ : c_ξ = 1}, C₂ = {ξ : c_ξ = 0} .
1. Let k = 1, P = {1, …, m}, I = {1, …, n}, L = {0, 1}, set weights of first dendrite of class C₁:
$ω_{ik}^{1} = - \underset{c_{ξ} = 1}{\land} x_{i}^{ξ}; \begin{matrix} ω_{ik}^{0} = - \underset{c_{ξ} = 1}{\lor} x_{i}^{ξ}, \end{matrix} \begin{matrix} \forall i \in I \end{matrix}$
2. Compute the response τ_k (x^ξ) of the current dendrite with p_k = (-1) ^sgn(k-1) and the total response τ (x^ξ) of the output neuron N:
$τ_{k} (x^{ξ}) = p_{k} \underset{i \in I}{\land} \underset{l \in L}{\land} r_{ik}^{l} (x_{i}^{ξ} + ω_{ik}^{1}), \begin{matrix} \forall ξ \in P \end{matrix}$
$τ (x^{ξ}) = \land_{j = 1}^{k} τ - j (x^{ξ}), \begin{matrix} \forall ξ \in P \end{matrix}$
3. If $f (τ (x^{ξ})) = c_{ξ}, \begin{matrix} \forall ξ \in P \end{matrix}$ , the algorithm ends here with perfect classification of the training set. Otherwise perform the following steps.
4. Add a new dendrite to N, let k = k + 1, D = C₁, I_j = I′ = X = E = H =∅.
5. Select a misclassified pattern x^γ from class C₀ with c_γ = 0 and f (τ (x^ξ)) =1.
6. $μ = \underset{ξ \neq γ}{\land} {\lor_{i = 1}^{n} \| x_{i}^{γ} - x_{i}^{ξ} \| : x^{ξ} \in D}$ .
7. $I^{'} = {i : \| x_{i}^{γ} - x_{i}^{ξ} \| = μ, x^{ξ} \in D, ξ \neq γ}$ ;
$X = {(i, x_{i}^{ξ}) : \| x_{i}^{γ} - x_{i}^{ξ} \| = μ, x^{ξ} \in D, ξ \neq γ}$ .
8. $\forall (i, x_{i}^{ξ}) \in X$ , if $x_{i}^{ξ} < x_{i}^{γ}$ set $ω_{ik}^{1} = - x_{i}^{ξ}$ , E = {1}
if $x_{i}^{ξ} > x_{i}^{γ}$ set $ω_{ik}^{0} = - x_{i}^{ξ}$ , H = {0}
9. Update index sets I and L: I = I ∪ I′, L = E ∪ H.
10. $D^{'} = {x^{ξ} \in D : \forall i \in I, - ω_{ik}^{1} < x_{i}^{ξ} {andx}_{i}^{ξ} < - ω_{ik}^{0}}$ , if D′ =∅, return to step 2. If not, set D = D′ and loop back to step 6.

More detailed descriptions of the DLNN model can be found in the paper’s references [9].

3 Improvement on DLNN via the use of a fuzzy inclusion measure

3.1 Basic theories on fuzzy inclusion measure

If a crisp lattice L and a fuzzy membership function defined as μ_p : S = {(x, y) : x, y ∈ L} → [0, 1], where μ_p (x, y) denotes the degree of x contained in y, meet the condition that μ_p (x, y) =1 if and only if x ≤ y, the pair (L, μ_p (x, y)) is called a fuzzy lattice.

The inclusion measure was proposed to quantitatively describe the inclusion relation between two elements on a fuzzy lattice, which has been optimized in a reference [13] as follows: $σ (x, u) = h (u) / - h (x \lor u)$ (5)

When L denotes the Cartesian product of N lattices, then L^N = L₁ × L₂ × ⋯ × L_N. The function h defined on L_N is given by $h (x_{1}, x_{2}, \dots, x_{n}) = h (x_{1}) + h (x_{2}) +, \dots, + h (x_{n})$ (6)

Suppose L is a complete lattice; the interval lattice τ (L) ={ [a, b] : a, b ∈ L } defined on L is a subset of L; and the lattice meeting point and lattice joint of τ (L) are defined as:

$\begin{matrix} [a, b] \land [c, d] = [a \lor c, b \lor d] \\ [a, b] \lor [c, d] = [a \land c, b \land d] \end{matrix}$ (7)

The corresponding partial ordering relation is given by $[a, b] < [c, d] \Leftrightarrow^{'} a \geq c^{'} {and}^{'} b \leq d^{'}$ (8)

The partial ordering relation in L^α that denotes the dual of a lattice L is the converse of L: $if a \leq b in L \Leftrightarrow b \leq a in L^{a}$ (9)

If an isomorphic function defined by θ : L^α → L in L meets x ≤ y ⇔ θ (x) ≥ θ (y), then the mapping transferring the inclusion measure from τ (L) to L is defined by ψ : [a, b] ∈ τ (L) → (θ (a) , b) ∈ L. Therefore, the inclusion measure in τ (L) is defined as: $σ ([a, b] \leq [c, d]) = \frac{h (θ (a)) + h (b)}{h (θ (a) \lor θ (c)) + h (b \lor d)}$ (10)

More in depth descriptions of fuzzy lattice can be found in the paper’s references [16 –19].

3.2 Improving DLNN based on a fuzzy inclusion measure

We can surmise, based on the training algorithm of the DLNN model, that a dendritic morphological neural network will generate a series of hyperboxes for each class sample. When a testing sample is classified, its total response of the first hyperbox in the kth class samples C_k based on the Equation (2) must be computed. If the response is τ (x) ≥0, then f (τ) =1, which means that the testing sample is assigned to C_k.

Figure 1 shows the corresponding classification hyperboxes for a two-class problem, in which the “+” represents the first class samples C₁, the “∘” represents the second class samples C₂, and where hyperboxes with a dark-colored background belong to C₁ and those with a light-colored background belong to C₂.

The point x₁ (x₁ ∈ C₁) is a testing sample that falls outside all the hyperboxes in Fig. 1. To classify the sample x₁ according to the definition outlined in Section 2, we can obtain τ_{C
₁} (x₁) <0 and τ_{C
₂} (x₁) <0. This means that the sample x₁ is assigned neither to C₁ nor to C₂. The result above confirms that the DLNN model cannot classify the testing samples that fall outside all the hyperboxes.

In this section, the fuzzy lattice inclusion measures introduced in Section 3.1 are calculated as the membership values of the testing samples in different hyperboxes. Subsequently, the hyperbox with the maximum value of inclusion measure is assigned to the testing sample.

Taking the sample x₁ and the hyperboxes of two classes’ in Fig. 1 as an example classifies the testing sample x₁. The isomorphic function θ is defined by θ (x) =2 - x, x₁ = [0.1, 1.1], c₁ = [0.0070, 0.0014, 0.9892, 0.9797], c₂ = [0.6089, 0.6149, 1.5806, 1.5912], where c₁ (or c₂) denotes the first hyperbox of C₁ (or C₂). Then we computed, respectively, the measures of x₁ inclusion to hyperboxes c₁ and c₂ using Equation (10). $\begin{matrix} σ_{1} (I (x_{1}) \leq I (c_{1})) \\ = σ_{1} ([0.1, 0.1, 1.1, 1.1] \leq [0.0070, 0.0014, 0.9892, 0.9797]) \\ = σ_{1} ([1.9, 0.1, 0.9, 1.1] \leq [1.9930, 0.0014, 1.0108, 0.9797]) \\ = \frac{h (1.9930, 0.0014, 1.0108, 0.9797)}{h ((1.9, 0.1, 0.9, 1.1) \lor (1.9930, 0.0014, 1.0108, 0.9797))} \\ = 3.9849 / - 4.2038 = 0.9479 \\ σ_{2} (I (x_{1}) \leq I (c_{2})) \\ = σ_{2} ([0.1, 0.1, 1.1, 1.1] \leq [0.6089, 0.6149, 1.5806, 1.5912]) \\ = σ_{2} ([1.9, 0.1, 0.9, 1.1] \leq [1.3911, 0.6149, 0.4194, 1.5912]) \\ = \frac{h (1.3911, 0.6149, 0.4194, 1.5912)}{h ((1.9, 0.1, 0.9, 1.1) \lor (1.3911, 0.6149, 0.4194, 1.5912))} \\ = 4.0166 / - 5.0061 = 0.7993 \end{matrix}$

It is easily observable that when the sample is closer to the hyperboxes, the corresponding inclusion measure is larger. The testing sample x₁ is accurately assigned to the first class samples C₁ following the principle of maximum membershipdegree.

3.3 Testing algorithm of IDLNN

The training algorithm of IDLNN is the same as the DLNN model, and the testing algorithm of IDLNN based on the principle of maximum membership degree is described as follows:

Step 1. Receive the testing sample x_i.

Step 2. Evaluate whether testing sample x_i is covered by the hyperboxes: doesτ (x^ξ) ≥0 ?

If true, classify x_i with the DLMM model described in Section 2. If false, run next step.

Step 3. Compute the inclusion measures to the first hyperbox w_k of the kth class samples C_k : σ (x_i ≤ w_k) , k = 1, 2, ⋯ , K, where K denotes the total number of sample class.

Step 4. Obtain the maximum membership degree and the corresponding class of hyperbox C_j, j ∈ {1, 2, ⋯ , K}.

Step 5. Classify the testing sample x_i to C_j.

4 Experimental results on standard datasets

In this section, the IDLNN model is applied to four standard datasets to verify the validity of the IDLNN model. The DLNN classifiers are used to provide a comparison. The classification accuracy referred to below is an average of 20 experiments.

4.1 Ripley dataset

The Ripley dataset is generated by mixture Gaussian distribution in a two-dimensional space [3], in which there is a training sample set consisting of 250 total samples, and a testing sample set consisting of 1000 samples for binary classification problem.

Figure 2 shows the hyperboxes’ training by the IDLNN model. Table 1 offers the testing classification accuracy of DLNN and IDLNN on the training set and the testing set. As illustrated in Fig. 2, the hyperboxes are able to cover the most training samples. One particular hyperbox was comparable to the DLNN classifier. Ultimately, the IDLNN classifier is capable of higher classification accuracy on both the training dataset and the testing dataset.

4.2 Spiral dataset

Spiral data [14] is a classical set for binary classification problems. It is generated by: $R_{i} = 0.4 (105 - n_{i}) / - 104$ (11) $α_{i} = π (n_{i} - 1) / - 16$ (12) $x_{i} = R_{i} sin (α_{i} + q randn (1, 1) / - 100) + 0.5$ (13) $y_{i} = R_{i} cos (α_{i} + q randn (1, 1) / - 100) + 0.5$ (14) where n_i = 0.5i (i = 1, 2, ⋯ , 200); randn (1, 1) is a random number generated by the standard Gaussian distribution; and q (q = 5) is the parameter to adjust the distance between classes. The training dataset with a total of 400 samples contains the first class dataset consisting of the data pair $(x_{i}, y_{i})_{i = 1}^{200}$ , and the second-class dataset consisting of the data pair $(1 - x_{i}, 1 - y_{i})_{i = 1}^{200}$ . Similarly, the testing dataset with 400 samples is generated by the Equations (11–14).

Figure 3 presents the spiral dataset’s training of hyperboxes by the IDLNN model; it reveals that the most training data can be covered by the hyperboxes. Table 2 lists the classification accuracies of the training dataset and testing dataset with the classifiers DLNN and IDLNN. As shown, the IDLNN outperforms DLNN on both training and testing datasets.

4.3 Iris dataset

Iris dataset [15] consists of three classes of iris flowers with each class containing 50 samples. One of the three classes is linearly separable from the other two classes, and the other two classes are nonlinearly separable.

In each class, 25 total samples were randomly selected for training, with the rest utilized as the testing dataset. Table 3 displays classification accuracies with DLNN and IDLNN for this three-class problem. Comparatively, the DLNN performs more poorly than the IDLNN on both training and testing datasets.

4.4 Wine dataset

The Wine dataset [1] consists of three classes of wine, with 59, 71 and 48 samples per class (respectively). 30, 36 and 24 samples were randomly selected from each class as the training dataset. The remaining samples were used as the testing dataset. Table 4 shows the classification accuracies, which are much higher with IDLNN than with DLNN in both the training samples and the testing samples.

5 Conclusion

In this paper, we have analyzed some of the reasons behind the poor classification results of the dendritic lattice neural network (DLNN). We then improved the network based on a fuzzy inclusion measure. Four classical datasets (the Ripley dataset, Spiral dataset, Iris dataset, and Wine dataset) were used to verify the validity of the new IDLNN model and to compare it with the older DLNN model.

Experimental results have demonstrated that IDLNN is capable of higher classification accuracy than DLNN. For two binary classification problems, using the Ripley dataset and Spiral dataset, the DLNN classifier separated the training samples with 94% and 98% classification accuracy, lower than the IDLNN classifier with 99.6% and 99%. Meanwhile, the classification accuracy of the IDLNN model on testing samples is much higher than that of the DLNN model. For the other two multi-classification problems, the classification accuracy of the DLNN model on training samples and testing samples was not more than 80.17%; much lower than with the IDLNN.

References

Asuncion

and Newman

, UCI Repository of Machine Learning Databases, 2007, http://archive.ics.uci.edu/

Barmpoutis

and Ritter

G.X.

, Orthonormal basis lattice neural networks, Studies in Computational Intelligence67 (2007), 45–58.

Ripley

B.D.

, Pattern Recognition and Neural Networks, Cambridge University Press, Cambridge, UK, 1996, pp. 97–124.

Chyzhyk

and Grana

, Optimal hyperbox shrinking in dendritic computing applied to Alzheimer’s disease detection in MRI, Advances in Intelligent and Soft Computing87 (2011), 543–550.

Chyzhyk

, Grana

, Savio

and Maiora

, Hybrid dendritic computing with kernel-LICA applied to Alzheimer’s disease detection in MRI, Neurocomputing75 (2012), 72–77.

Ritter

G.X.

, Urcid

and Valdiviezo-N

J.C.

, Two lattice metrics dendritic computing for pattern recognition, IEEE International Fuzzy Systems Conference Proceedings (2014), 45–52.

Urcid

, Ritter

G.X.

and Iancu

, Single layer morphological perceptron solution to the N-bit parity problem, Lecture Notes in Computer Science3287 (2004), 171–178.

Urcid

, Lara-Rodriguez

L.D.

and Lopez-Melendez

, A dendritic lattice neural network for color image segmentation, Proceedings of SPIE9599 (2015), 1–10.

Ritter

G.X.

and Urcid

, Lattice algebra approach to single-neuron computation, IEEE Transactions on Neural Networks14(2) (2003), 282–295.

10.

Ritter

G.X.

and Iancu

, Single layer feedforward neural network based on lattice algebra, Proceedings of the International Joint Conference on Neural Networks4 (2003), 2887–2892.

11.

Ritter

G.X.

, Iancu

and Schmalz

M.S.

, New auto-associative memory based on lattice algebra, Lecture Notes in Computer Science3287 (2004), 148–155.

12.

Ritter

G.X.

and Urcid

, Lattice neural networks with spike trains, Lecture Notes in Computer Science6077 (2010), 367–374.

13.

Sossa

and Guevara

, Efficient training for dendrite morphological neural networks, Neurocomputing131 (2014), 132–142.

14.

Lang

K.J.

and Witbrock

D.J.

, Learning to tell two spirals apart, in: Proceedings of the 1988 Connectionist Models Summer School, Morgan Kaufmann, 1989, pp. 52–59.

15.

Fisher

, The use of multiple measurements in taxonomic problems, Annals of Eugenics7(2) (1936), 179–188.

16.

Kaburlasos

V.G.

and Petridis

, Fuzzy lattice neurocomputing (FLN) models, Neural Networks13(10) (2000), 1145–1170.

17.

Kaburlasos

V.G.

, Towards A Unified Modeling and Knowledge-Representation Based on Lattice Theory Studies in Computational Intelligence, vol. 27, Springer, Heidelberg, 2006, pp. 1–242.

18.

Kaburlasos

V.G.

, Athanasiadis

I.N.

and Mitkas

P.A.

, Fuzzy lattice reasoning (FLR) classifier and its application for ambient ozone estimation, International Journal of Approximate Reasoning45(1) (2007), 152–188.

19.

Petridis

and Kaburlasos

V.G.

, Fuzzy lattice neural network (FLNN): A hybrid, model for learning, IEEE Transactions on Neural Networks9(5) (1998), 877–890.