The associative pattern classifier: Progress in theoretical understanding

Abstract

The Associative Pattern Classifier (APC) was designed as an associative memory, focusing particularly on pattern classification. This implies that the training memory is constructed in a single operation and pattern classification also occurs in a single process. It is important to note that the APC translates the input patterns through a translation vector, which represents the average of all input patterns. Until now, there is no theoretical framework to explain the inner workings of the APC. Its relevance is inferred from the fact that several studies have been conducted using it as a foundation. This paper seeks to provide a theoretical comprehension of the APC’s operation to facilitate future enhancements. We found the APC creates a system in static equilibrium through concurrent vectors at the origin (translation vector), resulting in a balanced separation of patterns. However, the APC cannot achieve complete pattern separation because of the presence of a neutral region. The neutral region is defined by all the points that define the separation hyperplanes. The points over the hyperplanes cannot be classified by the APC. Additionally, we discovered that the APC is unable to accurately classify the translation vector, which could be included as part of the input patterns. Our previous research showed that the APC is unsuccessful in achieving the linear separation of the AND function. In this research, we also broaden the examination of the AND function to illustrate that achieving linear separation is not feasible because the separation line represents a neutral region. The APC demonstrated exceptional performance when tested with artificial datasets where patterns were distributed over balanced regions, thus operating as an efficient multiclass and non-linear classifier. Nevertheless, the performance of the APC is lower when tested with real-world databases, making the APC inaccurate due to its restricted inner workings.

Keywords

Classifier pattern associative memory class classification

1 Introduction

An Artificial Neural Network (ANN) with a layer is referred to as Associative Memory (AM) [1, 2]. In an AM, each input pattern is associated with an output pattern. This means that an AM can associate pairs of similar or different patterns, making it capable of both autoassociative and heteroassociative tasks. Classification is a form of pattern heteroassociation [1, 2]. Let’s consider a set of learning patterns divided into a predefined number of classes. In response to an input pattern, the AM provides a class vector expressed with discrete values. There are two distinct operational phases in an AM: the construction and the retrieval phases. During the construction phase, a matrix or memory is generated. In the retrieval phase, an input pattern is presented to the memory to obtain its corresponding associated output pattern or class. Within an AM, there are two modes of pattern retrieval: static and dynamic. In static retrieval, the output pattern is determined through a simple synchronous update step. In dynamic retrieval, the output pattern is determined through an iterative feedback process [3].

The first reported model of AM in the literature is the Lernmatrix [4, 5]. This model can function as a binary pattern classifier, associating each pattern with its corresponding class. A second proposed model of AM is the Linear Associator (LA) [6–8]. The Learnmatrix can only accept binary patterns as inputs for classification. When the necessary and sufficient conditions, mentioned in [9–11], are not met, the Lernmatrix is not capable of recovering the fundamental set of associations. On the other hand, the LA imposes a strong constraint by requiring the orthogonality of input patterns [12]. For LA, perfect retrieval of the fundamental ser of associations is not achievable unless the number of stored patterns is small compared to the dimension n of the input patterns. Some researchers suggest that this small number of patterns should be between 0.1n and 0.2n [13–15]. In general, the purpose of AMs is pattern association rather than classification.

The Associative Pattern Classifier (APC) [16] is an AM with a specific purpose of pattern classification. The APC algorithm is proposed to address the drawbacks of the Lernmatrix and LA. This algorithm is primarily based on the training rule of the LA and the recovery rule of the Lernmatrix. In the APC, both the construction and recovery stages are performed statically. This means that the training memory is created in a single step, and pattern classification is also done in a single step. This algorithm enables real-number operations, overcoming the limitation of the Lernmatrix, which operates exclusively on binary numbers. It also removes the orthogonality constraint of the training set in the LA, as well as the restriction that the number of patterns in the training set must be small relative to the input pattern dimension. Furthermore, it maintains stable classification performance when trained with at least 10% of the total patterns from a given database [17].

Until today, since the inception of the APC in 2003, there has been a lack of a formal or theoretical framework for how it works. However, it has been deduced that it is accurate, as evidenced by the fact that some studies have been developed based on it. For example, in [18], the APC was employed to diagnose breast cancer, achieving an accuracy of 97.31% on the Breast Cancer Wisconsin database. In [19], an improvement that reduces the limitations of the original algorithm for multiclass processes was successfully implemented. In [20], was integrated the APC with an innovative coding technique and a voting procedure that enhanced the APC’s performance. Meanwhile, in [21] it was reported that the APC achieved the highest true positive rate in comparison to three traditional ANNs when tested on a collection of 58 real-world databases with imbalanced data. Aldape-Pérez [22] introduced a reinforcement phase following the training phase of the APC, specifically for medical diagnosis. Throughout all the tests, the proposed method consistently outperformed other algorithms in terms of average performance. In [23], a modification using fuzzy logic is presented for the diagnosis of diabetes mellitus. In that proposal, a degree of membership to each class is assigned to each training pattern, done before proceeding with training. The proposed algorithm only outperforms the others in average sensitivity. The author concludes that its proposal does not yield better results than the APC. In [24], the APC is applied to the diagnosis of diabetes mellitus, competing with today’s best classifiers and achieving results close to them. In [16], the author obtains better classification results on the Wine recognition database than three traditional machine learning algorithms.

Our research group has presented advancements related to the APC in two previous publications. In [18], we established the accuracy of the APC in handling the Breast Cancer Wisconsin database. Furthermore, in [19], we demonstrated that the APC functions as a bi-class algorithm, with no assurance of effectively solving multiclass problems. To overcome this limitation, we proposed a hierarchical tree-based approach. In the same publication, we highlighted that the APC struggles to solve even the simplest linear separation problem, the AND function. To address this issue, we introduced a method for calculating the boundary decision of the translation vector.

In contrast to our [18, 19] and the other earlier works [16, 20–24], which relied on empirical evidence, the focus of this paper is to provide a theoretical understanding of the APC’s functioning for future enhancements. We discovered that the APC employs concurrent vectors at the translation vector to establish a static equilibrium system, a principle we substantiated through a theorem. While this equilibrium system proves accurate for balancing class representation in various regions, it poses a drawback since real-world pattern classes are often imbalanced, as observed in tests on real-world databases. Despite this disadvantage, the APC showcases exceptional performance in tests involving artificial datasets with balanced pattern distribution, serving as an effective multiclass and non-linear classifier. However, it falls short when aiming for complete pattern separation due to the presence of a neutral region. The neutral region comprises all points that delineate the separation hyperplanes. Points located over the hyperplanes cannot be classified by the APC. This limitation is evident in the inability to solve the AND function. This paper also extends the analysis of solving the AND function initially discussed in [19]. Furthermore, we found that the APC struggles to precisely classify the translation vector.

The rest of this work is divided into the following sections. Section 2 presents the APC algorithm. Section 3 discusses general insights into how this classifier functions. In Section 4 a specific bi-class scenario is presented, offering further deductions. Section 5 introduces the specific case related to the AND function. Section 6 provides a rigorous and formal demonstration of the APC functioning as a static system and gives a formal demonstration that the APC cannot classify the translation vector. Section 7 details the design of experiments conducted to validate these observations statistically. Section 8 presents the conclusions.

2 The associative pattern classifier

An AM is a single-layer ANN that maps a set of input patterns x^k to a set of output patterns y^k in such a way that each pattern x^k is associated with a pattern y^k [25]. Here, x^k ∈ Xⁿ and y^k ∈ Y^q for all K = 1, 2, . . . , p, where k is an index representing a specific pair of associated patterns, and n and q are the dimensions of the vectors x^k and y^k, respectively. p is the cardinality of the pattern set, and X and Y are two arbitrary sets. An AM M can be represented using the schema: x^k → M → y^k, where M is a correlation matrix of p associations [3].

The APC proposed in [16] begins in the following manner:

1
A fundamental set of associations is defined as:
$S = {(x^{k}, y^{k}) | K = 1, 2, . . ., p}$
(1)
where x^k ∈ Rⁿ, representing the set of input patterns, and y^k ∈ { 0, 1 } ^q, representing the set of output patterns. Here, n is the dimension of x^k, q is the dimension of y^k, and p is the cardinality of the fundamental set.
2
The classc∈ { 1, 2, . . . , q } to which each input pattern x^k belongs is defined by:
$y_{j}^{k} = {\begin{matrix} 1 & for : j = c \\ 0 & for : j = 1, 2, . . ., c - 1, c + 1, . . ., q \end{matrix}$
(2)
where $y_{j}^{k}$ represents the class index.
In the process of M’s learning, every pair (x^k, y^k) belonging to set S is presented to the AM. The APC utilizes the learning or construction phase of the LA as follows: 1
A translation vector $\bar{x}$ is computed as:
$\bar{x} = \frac{1}{p} \sum_{k = 1}^{p} x^{k}$
(3)
2
Subsequently, the translation of all patterns x^k in the fundamental set S is carried out concerning the translation vector $\bar{x}$ using:
$x_{t}^{k} = x^{k} - \bar{x}$
(4)
3
The matrix M is computed using:
$M = \sum_{k = 1}^{p} y^{k} {(x_{t}^{k})}^{t}$
(5)
Just like the recovery phase in the Lernmatrix, the recovery phase of the APC involves presenting an input pattern $x^{ω} \in ℝ^{n}$ to the matrix M. At the output of the APC, you will obtain the class y^ω ∈ { 0, 1 } ^q, to which the presented pattern belongs, as follows: 1
First, the input pattern x^ω is initially translated with respect to the translation vector $\bar{x}$ as:
$x_{t}^{ω} = x^{ω} - \bar{x}$
(6)
2
The following product is calculated:
$z^{ω} = M x_{t}^{ω}$
(7)
3
The class vector y^ω is obtained as follows:
$y_{j}^{ω} = {\begin{matrix} 1 & if : z_{j}^{ω} = ⋁_{h = 1}^{m} z_{h}^{ω} \\ 0 & otherwise \end{matrix}$
(8)
4
Finally, there is the class index, for which x^ω belongs, as the position j in the vector y^ω where $y_{j}^{ω} = 1$ .

3 Observations on the functioning of the APC

After defining the APC, we will proceed to illustrate the operational mechanism of the APC with various examples. In Subsection 3.1, we demonstrate how the APC establishes a system in static equilibrium. Moving on to Subsection 3.2, we explain how the APC generates separation regions using a set of hyperplanes in the translation plane. Subsection 3.3 shows how separation regions are formed in the original plane. In Subsection 3.4, we present the mathematical definition of how the Euclidean space is partitioned into different decision regions. Finally, in Subsection 3.5, we explore various behaviours of the APC’s performance during the recovery phase.

3.1 Concurrent vector system

Let’s assume that the following arbitrary patterns x^k are associated with their respective class vectors y^k:

\begin{matrix} x^{1} = (\begin{matrix} \begin{matrix} 7 \\ 3 \end{matrix} \end{matrix}), y^{1} = (\begin{matrix} \begin{matrix} 1 \\ 0 \\ 0 \end{matrix} \end{matrix}); \\ x^{2} = (\begin{matrix} \begin{matrix} - 1 \\ 5 \end{matrix} \end{matrix}), y^{2} = (\begin{matrix} \begin{matrix} 0 \\ 1 \\ 0 \end{matrix} \end{matrix}); \\ x^{3} = (\begin{matrix} \begin{matrix} 3 \\ - 2 \end{matrix} \end{matrix}), y^{3} = (\begin{matrix} \begin{matrix} 0 \\ 0 \\ 1 \end{matrix} \end{matrix}) \end{matrix}

(9)

where p = 3, n = 2, and q = 3.

Taking into account the construction phase of the APC: 1

A translation vector $\bar{x}$ is computed:

\begin{matrix} \bar{x} = \frac{1}{3} [(\begin{matrix} \begin{matrix} 7 \\ 3 \end{matrix} \end{matrix}) + (\begin{matrix} \begin{matrix} - 1 \\ 5 \end{matrix} \end{matrix}) + (\begin{matrix} \begin{matrix} 3 \\ - 2 \end{matrix} \end{matrix})] = \\ (\begin{matrix} \begin{matrix} 3 \\ 2 \end{matrix} \end{matrix}) \end{matrix}

(10)

Figure 1 illustrates the distribution of input patterns in Equation 9, along with the translation vector in Equation 10, on the Euclidean plane.

Fig. 1

Distribution of input patterns and translation vector of the APC in the Euclidian plane.

The input patterns x^k are translated as follows:

\begin{matrix} x_{t}^{1} = (\begin{matrix} \begin{matrix} 7 \\ 3 \end{matrix} \end{matrix}) - (\begin{matrix} \begin{matrix} 3 \\ 2 \end{matrix} \end{matrix}) = (\begin{matrix} \begin{matrix} 4 \\ 1 \end{matrix} \end{matrix}) \\ x_{t}^{2} = (\begin{matrix} \begin{matrix} - 1 \\ 5 \end{matrix} \end{matrix}) - (\begin{matrix} \begin{matrix} 3 \\ 2 \end{matrix} \end{matrix}) = (\begin{matrix} \begin{matrix} - 4 \\ 3 \end{matrix} \end{matrix}) \\ x_{t}^{3} = (\begin{matrix} \begin{matrix} 3 \\ - 2 \end{matrix} \end{matrix}) - (\begin{matrix} \begin{matrix} 3 \\ 2 \end{matrix} \end{matrix}) = (\begin{matrix} \begin{matrix} 0 \\ - 4 \end{matrix} \end{matrix}) \end{matrix}

(11)

The matrix M is constructed as:

\begin{matrix} M = (\begin{matrix} \begin{matrix} 1 \\ 0 \\ 0 \end{matrix} \end{matrix}) (\begin{matrix} \begin{matrix} 4, & 1 \end{matrix} \end{matrix}) + (\begin{matrix} \begin{matrix} 0 \\ 1 \\ 0 \end{matrix} \end{matrix}) (\begin{matrix} \begin{matrix} - 4, & 3 \end{matrix} \end{matrix}) + \end{matrix} \begin{matrix} (\begin{matrix} \begin{matrix} 0 \\ 0 \\ 1 \end{matrix} \end{matrix}) (\begin{matrix} \begin{matrix} 0, & - 4 \end{matrix} \end{matrix}) = (\begin{matrix} \begin{matrix} 4 & 1 \\ - 4 & 3 \\ 0 & - 4 \end{matrix} \end{matrix}) \end{matrix}

(12)

The learning phase has concluded up to this point. Now, we will extract the weight vectors

m^{i} = {(\begin{matrix} m_{i 1}, & m_{i 2}, & \dots & m_{in} \end{matrix})}^{t}

from the matrix M associated with the class index i∈ { 1, 2, . . . , q }:

m^{1} = (\begin{matrix} \begin{matrix} 4 \\ 1 \end{matrix} \end{matrix}); m^{2} = (\begin{matrix} \begin{matrix} - 4 \\ 3 \end{matrix} \end{matrix}); m^{3} = (\begin{matrix} \begin{matrix} 0 \\ - 4 \end{matrix} \end{matrix})

(13)

The matrix M has been constructed from the translated input patterns. In Fig. 2, you can see the distribution of the vectors mⁱ in the translation plane, as opposed to the original plane in Fig. 1, where the patterns have not yet been translated. The vectors mⁱ can be viewed as a system of concurrent vectors at the origin. It’s important to note that these vectors satisfy the first law of Newton, which states that if the resultant force of a system of vectors is zero, $\sum_{i = 1}^{q} m^{i} = 0$ , then the system’s velocity will not change. If it is at rest, it will remain at rest, and if it is in motion, it will continue to move at a constant speed [26]. Therefore, it can be described as a system in static equilibrium. This implies that the matrix M creates a balanced separation of patterns.

Fig. 2

Concurrent and balanced vector distribution of the APC in the translation plane.

3.2 Distribution of regions on the translation plane

Without loss of generality, the vectors mⁱ can be regarded as points in the Euclidean plane. To determine the distribution of regions generated by the APC, a set of hyperplanes, passing through each pair of points mⁱ and m^j for i ≠ j, are drawn. These hyperplanes will be referred to as u_ij for i ≠ j. Continuing with the example, for the points in Equation 13, the hyperplane passing through the points m¹ and m² is:

u_{12} (x) = x_{2} + \frac{1}{4} x_{1} - 2 = 0

(14)

For points m² and m³, the hyperplane is:

u_{23} (x) = x_{2} + \frac{7}{4} x_{1} + 4 = 0

(15)

For points m³ and m¹, the hyperplane is:

u_{31} (x) = x_{2} - \frac{5}{4} x_{1} + 4 = 0

(16)

In Fig. 3, the geometric representation for Equations 14–16 can be observed.

Fig. 3

The hyperplanes v_ij that divide the translation plane are perpendicular to the hyperplanes u_ij of the APC.

Now, the hyperplanes that divide the translation plane are perpendicular to the hyperplanes u_ij, and they will be referred to as v_ij for i ≠ j. Continuing with the example, the hyperplanes u_ij from Equations 14–16, yield the following perpendicular hyperplanes passing through the origin:

v_{12} (x) = x_{2} - 4 x_{1} = 0

(17)

v_{23} (x) = x_{2} - \frac{4}{7} x_{1} = 0

(18)

v_{31} (x) = x_{2} + \frac{4}{5} x_{1} = 0

(19)

In Fig. 3, the geometric representation of Equations 17–19 can be observed.

3.3 Distribution of regions on the original plane

To here, our work has been focused on the translation plane, and now the hyperplanes v_ij will be shifted back to the original plane. These hyperplanes now serve as linear decision functions that separate the regions. These functions will be denoted as h_ij for i ≠ j, which are obtained by translating the hyperplanes v_ij concerning the translation vector $\bar{x}$ . Continuing with the example, the hyperplanes from Equations 17–19 are shifted with respect to the vector $\bar{x}$ in Equation 10, resulting in the following:

h_{12} (x) = x_{2} - 4 x_{1} + 10 = 0

(20)

h_{23} (x) = x_{2} - \frac{4}{7} x_{1} - \frac{2}{7} = 0

(21)

h_{31} (x) = x_{2} + \frac{4}{5} x_{1} - \frac{22}{5} = 0

(22)

In Fig. 4, you can observe the geometric representation of Equations 20–22.

Fig. 4

Regions and linear decision functions of the APC in the original plane.

In Fig. 4, it can be observed that the dashed portion of a linear function indicates that within that region, the function does not influence the classification process, as it only does so for the regions it separates. Therefore, the APC is capable of generating a non-linear decision function in a general way.

3.4 Class separation

From the previous analysis, considering a set of q pattern classes ${c_{i}}_{i = 1}^{q}$ in $ℝ^{n}$ and a translated input pattern $x_{t}^{ω}$ , we have:

{(m^{i})}^{t} x_{t}^{ω} = {\begin{matrix} ⋁_{h = 1}^{q} [{(m^{h})}^{t} x_{t}^{ω}] & \forall x_{t}^{ω} \in c_{i} \\ ⋀_{h = 1}^{q} [{(m^{h})}^{t} x_{t}^{ω}] & otherwise \end{matrix}

(23)

for 1 ≤ i ≤ q, where mⁱ represents the weight vector associated with the class set

{c_{i}}_{i = 1}^{q}

In this manner, the Euclidean space is divided into q decision regions as follows:

\begin{matrix} R_{i} = {x_{t}^{ω} ∣ {(m^{i})}^{t} x_{t}^{ω} = \\ ⋁_{h = 1}^{q} [{(m^{h})}^{t} x_{t}^{ω}]; \\ {(m^{j})}^{t} x_{t}^{ω} = ⋀_{h = 1}^{q} [{(m^{h})}^{t} x_{t}^{ω}] \\ , i \neq j}, 1 \leq i \leq q \end{matrix}

(24)

Please note that a neutral region exists, which means there is no absolute separation between the regions. The neutral region is defined by all the points that define the hyperplanes. In other words, the points over the hyperplanes cannot be classified by the APC. It’s also worth noting that the regions originate from the translation vector $\bar{x}$ . Therefore, the generation of the APC’s separation regions depends directly on the distribution of the patterns to be classified.

3.5 Operation of the APC during the recovery phase

Continuing with the example, several unknown patterns are presented to the matrix M to observe the APC performance during the recovery phase.

Let’s assume a noisy pattern x^ω belonging to class c₁: $x^{ω} = {(\begin{matrix} 4, & 5 \end{matrix})}^{t}$ . By the recovery phase of the APC, the following steps are performed:

1
The noisy pattern x^ω is translated with respect to the translation vector $\bar{x}$ :
$x_{t}^{ω} = (\begin{matrix} \begin{matrix} 4 \\ 5 \end{matrix} \end{matrix}) - (\begin{matrix} \begin{matrix} 3 \\ 2 \end{matrix} \end{matrix}) = (\begin{matrix} \begin{matrix} 1 \\ 3 \end{matrix} \end{matrix})$
(25)
2
The matrix M is multiplied by the translated pattern $x_{t}^{ω}$ :
$z^{ω} = (\begin{matrix} \begin{matrix} 4 & 1 \\ - 4 & 3 \\ 0 & - 4 \end{matrix} \end{matrix}) (\begin{matrix} \begin{matrix} 1 \\ 3 \end{matrix} \end{matrix}) = (\begin{matrix} \begin{matrix} 7 \\ 5 \\ - 12 \end{matrix} \end{matrix})$
(26)
3
The class vector y^ω is computed from z^ω:
$y^{ω} = (\begin{matrix} \begin{matrix} 1 \\ 0 \\ 0 \end{matrix} \end{matrix})$
(27)
4
Thus, the vector y^ω in Equation 27 equals the vector y¹ in Equation 9. Therefore, the noisy pattern x^ω was accurately classified into class c₁.
Now, let’s take a noisy vector that solves hyperplane h₃₁ (Equation 22): $x^{ω} = {(\begin{matrix} 8, & - 2 \end{matrix})}^{t}$ , which lies between classes c₁ and c₃. 1
The pattern x^ω is translated concerning the translation vector $\bar{x}$ :
$x_{t}^{ω} = (\begin{matrix} \begin{matrix} 8 \\ - 2 \end{matrix} \end{matrix}) - (\begin{matrix} \begin{matrix} 3 \\ 2 \end{matrix} \end{matrix}) = (\begin{matrix} \begin{matrix} 5 \\ - 4 \end{matrix} \end{matrix})$
(28)
2
The translated pattern $x_{t}^{ω}$ is used to multiply the matrix M:
$z^{ω} = (\begin{matrix} \begin{matrix} 4 & 1 \\ - 4 & 3 \\ 0 & - 4 \end{matrix} \end{matrix}) (\begin{matrix} \begin{matrix} 5 \\ - 4 \end{matrix} \end{matrix}) = (\begin{matrix} \begin{matrix} 16 \\ - 32 \\ 16 \end{matrix} \end{matrix})$
(29)
3
The class vector y^ω is calculated from z^ω:
$y^{ω} = (\begin{matrix} \begin{matrix} 1 \\ 0 \\ 1 \end{matrix} \end{matrix})$
(30)
4
Thus, the vector y^ω is not equal to any of the vectors y^k in Equation 9. Therefore, the vector x^ω cannot be classified since it falls within the neutral region between classes c₁ and c₃, as mentioned in Subsection 3.4.
Now, let’s examine a pattern that traverses the same separation hyperplane h₃₁ (Equation 22), but this time on the side that doesn’t interfere with the division of regions (the dashed portion within the region class c₂ in Fig. 4): $x^{ω} = {(\begin{matrix} 2, & 2.8 \end{matrix})}^{t}$ . 1
The pattern x^ω is translated:
$x_{t}^{ω} = (\begin{matrix} \begin{matrix} 2 \\ 2.8 \end{matrix} \end{matrix}) - (\begin{matrix} \begin{matrix} 3 \\ 2 \end{matrix} \end{matrix}) = (\begin{matrix} \begin{matrix} - 1 \\ 0.8 \end{matrix} \end{matrix})$
(31)
2
The matrix M is multiplied by the translated pattern $x_{t}^{ω}$ :
$z^{ω} = (\begin{matrix} \begin{matrix} 4 & 1 \\ - 4 & 3 \\ 0 & - 4 \end{matrix} \end{matrix}) (\begin{matrix} \begin{matrix} - 1 \\ 0.8 \end{matrix} \end{matrix}) = (\begin{matrix} \begin{matrix} - 3.2 \\ 6.4 \\ - 3.2 \end{matrix} \end{matrix})$
(32)
3
The class vector y^ω is computed from the pattern z^ω:
$y^{ω} = (\begin{matrix} \begin{matrix} 0 \\ 1 \\ 0 \end{matrix} \end{matrix})$
(33)
4
Thus, the vector y^ω in Equation 33 equals the vector y² in Equation 9. Therefore, the vector x^ω was correctly classified as it falls within the region of class c₂.
In this latter case, note that in the vector z^ω of Equation 32, there are two values equal to -3.2, indicating some uncertainty as to whether the vector x^ω leans more towards class c₁ or class c₃. Nevertheless, this does not impact the classification in any way.
4 An arbitrary bi-class case

Now, we will consider a specific case for two classes from which further observations will be derived. Let’s assume that we have the following set of arbitrary associations:

\begin{matrix} x^{1} = (\begin{matrix} \begin{matrix} 6 \\ 5 \\ 2 \end{matrix} \end{matrix}), y^{1} = (\begin{matrix} \begin{matrix} 1 \\ 0 \end{matrix} \end{matrix}); \\ x^{2} = (\begin{matrix} \begin{matrix} - 4 \\ 11 \\ - 8 \end{matrix} \end{matrix}), y^{2} = (\begin{matrix} \begin{matrix} 0 \\ 1 \end{matrix} \end{matrix}) \end{matrix}

(34)

where p = 2, n = 3, and q = 2.

Taking into account the construction phase of the APC: 1

A translation vector $\bar{x}$ is computed:

\bar{x} = \frac{1}{2} [(\begin{matrix} \begin{matrix} 6 \\ 5 \\ 2 \end{matrix} \end{matrix}) + (\begin{matrix} \begin{matrix} - 4 \\ 11 \\ - 8 \end{matrix} \end{matrix})] = (\begin{matrix} 1 \\ 8 \\ - 3 \end{matrix})

(35)

The input patterns x^k in Equation 34 are translated:

\begin{matrix} x_{t}^{1} = (\begin{matrix} \begin{matrix} 6 \\ 5 \\ 2 \end{matrix} \end{matrix}) - (\begin{matrix} \begin{matrix} 1 \\ 8 \\ - 3 \end{matrix} \end{matrix}) = (\begin{matrix} \begin{matrix} 5 \\ - 3 \\ 5 \end{matrix} \end{matrix}) \end{matrix} \begin{matrix} x_{t}^{2} = (\begin{matrix} \begin{matrix} - 4 \\ 11 \\ - 8 \end{matrix} \end{matrix}) - (\begin{matrix} 1 \\ 8 \\ - 3 \end{matrix}) = (\begin{matrix} \begin{matrix} - 5 \\ 3 \\ - 5 \end{matrix} \end{matrix}) \end{matrix}

(36)

The matrix M is constructed:

\begin{matrix} M = (\begin{matrix} \begin{matrix} 1 \\ 0 \end{matrix} \end{matrix}) (\begin{matrix} \begin{matrix} 5, & - 3, & 5 \end{matrix} \end{matrix}) + \\ (\begin{matrix} \begin{matrix} 0 \\ 1 \end{matrix} \end{matrix}) (\begin{matrix} \begin{matrix} - 5, & 3, & - 5 \end{matrix} \end{matrix}) = (\begin{matrix} \begin{matrix} 5 & - 3 & 5 \\ - 5 & 3 & - 5 \end{matrix} \end{matrix}) \end{matrix}

(37)

To test the classifier, a pattern

x^{ω} = {(\begin{matrix} 6, & 5, & 2 \end{matrix})}^{t} = x^{1}

is taken. It can be noted that this is an undistorted version of pattern x¹. Following the steps of the APC recovery phase:

The pattern x^ω is translated:

x_{t}^{ω} = (\begin{matrix} \begin{matrix} 6 \\ 5 \\ 2 \end{matrix} \end{matrix}) - (\begin{matrix} 1 \\ 8 \\ - 3 \end{matrix}) = (\begin{matrix} 5 \\ - 3 \\ 5 \end{matrix})

(38)

The matrix M is multiplied by the translated pattern $x_{t}^{ω}$ :

z^{ω} = (\begin{matrix} \begin{matrix} 5 & - 3 & 5 \\ - 5 & 3 & - 5 \end{matrix} \end{matrix}) (\begin{matrix} 5 \\ - 3 \\ 5 \end{matrix}) = (\begin{matrix} \begin{matrix} 59 \\ - 59 \end{matrix} \end{matrix})

(39)

The class vector y^ω is computed:

y^{ω} = (\begin{matrix} \begin{matrix} 1 \\ 0 \end{matrix} \end{matrix})

(40)

Thus, the vector y^ω in Equation 40 equals the vector y¹ in Equation 34. Therefore, the pattern x^ω was correctly classified into class one.

In this simple example, it is worth noting that in the case of two classes:

–

During construction, the translation of the two input patterns causes them to be the negatives of each other, that is, $x_{t}^{2} = - x_{t}^{1}$ . Since the two output patterns corresponding to the two classes are orthogonal, the matrix M will be composed of $x_{t}^{1}$ and its negative as follows:

M = (\begin{matrix} \begin{matrix} x_{t}^{1} \\ - x_{t}^{1} \end{matrix} \end{matrix})

(41)

Please note that there is a neutral position between the two vectors, represented by the vector

x = {(\begin{matrix} 0 & 0 & \dots & 0 \end{matrix})}^{t}

–

In the classification phase of the undistorted version of any input patterns, translation causes them to be initially transformed into their shifted original versions. The multiplication of the matrix M will always yield a maximum value at the output vector’s class index.

–

In the classification phase of a distorted version of any input pattern, translation initially causes them to be shifted to one of the original translated versions. The moved vector can appear on either side of its corresponding translated original version. As long as the added noise to the input pattern does not cause its translated version to exceed the neutral position, the input pattern will always be correctly classified. Of course, if the translation of an input pattern results in $x_{t}^{ω} = {(\begin{matrix} 0, & 0, & \dots & 0 \end{matrix})}^{t}$ , then the class vector cannot be found because $z^{ω} = {(\begin{matrix} 0, & 0, & \dots & 0 \end{matrix})}^{t}$ (see Theorem 2).

5 The specific case of the AND function

The AND function can be represented as shown in Table 1, where x₁ and x₂ represent the components of the training patterns, of which the first three can be labelled with the class “0” and the last pattern with the class “1”. This results in a set of four input patterns that belong to either class “0” or class “1” as the case may be.

Table 1
AND function

x ₁ x ₂ Class

0 0 0

0 1 0

1 0 0

1 1 1

x ₁	x ₂	Class
0	0	0
0	1	0
1	0	0
1	1	1

From Table 1, we can create the following set of associations:

\begin{matrix} x^{1} = (\begin{matrix} \begin{matrix} 0 \\ 0 \end{matrix} \end{matrix}), y^{1} = (\begin{matrix} \begin{matrix} 1 \\ 0 \end{matrix} \end{matrix}); x^{2} = (\begin{matrix} \begin{matrix} 0 \\ 1 \end{matrix} \end{matrix}), y^{2} = (\begin{matrix} \begin{matrix} 1 \\ 0 \end{matrix} \end{matrix}); \\ x^{3} = (\begin{matrix} \begin{matrix} 1 \\ 0 \end{matrix} \end{matrix}), y^{3} = (\begin{matrix} \begin{matrix} 1 \\ 0 \end{matrix} \end{matrix}); x^{4} = (\begin{matrix} \begin{matrix} 1 \\ 1 \end{matrix} \end{matrix}), y^{4} = (\begin{matrix} \begin{matrix} 0 \\ 1 \end{matrix} \end{matrix}); \end{matrix}

(42)

where p = 4, n = 2, and q = 2. It’s noteworthy that patterns x¹, x², and x³ belong to class “0”, while pattern x⁴ belongs to class “1”.

Considering the construction phase of the APC: 1

A translation vector $\bar{x}$ is computed:

\begin{matrix} \bar{x} = \frac{1}{4} [(\begin{matrix} \begin{matrix} 0 \\ 0 \end{matrix} \end{matrix}) + (\begin{matrix} \begin{matrix} 0 \\ 1 \end{matrix} \end{matrix}) + (\begin{matrix} \begin{matrix} 1 \\ 0 \end{matrix} \end{matrix}) + (\begin{matrix} \begin{matrix} 1 \\ 1 \end{matrix} \end{matrix})] \\ = (\begin{matrix} \begin{matrix} 0.5 \\ 0.5 \end{matrix} \end{matrix}) \end{matrix}

(43)

The input patterns x^k are translated:

\begin{matrix} x_{t}^{1} = (\begin{matrix} \begin{matrix} - 0.5 \\ - 0.5 \end{matrix} \end{matrix}); x_{t}^{2} = (\begin{matrix} \begin{matrix} - 0.5 \\ 0.5 \end{matrix} \end{matrix}); \\ x_{t}^{3} = (\begin{matrix} \begin{matrix} 0.5 \\ - 0.5 \end{matrix} \end{matrix}); x_{t}^{4} = (\begin{matrix} \begin{matrix} 0.5 \\ 0.5 \end{matrix} \end{matrix}) \end{matrix}

(44)

The matrix M is constructed:

\begin{matrix} M = (\begin{matrix} \begin{matrix} 1 \\ 0 \end{matrix} \end{matrix}) (\begin{matrix} \begin{matrix} - 0.5, & - 0.5 \end{matrix} \end{matrix}) + \\ (\begin{matrix} \begin{matrix} 1 \\ 0 \end{matrix} \end{matrix}) (\begin{matrix} \begin{matrix} - 0.5, & 0.5 \end{matrix} \end{matrix}) + (\begin{matrix} \begin{matrix} 1 \\ 0 \end{matrix} \end{matrix}) (\begin{matrix} \begin{matrix} 0.5, & - 0.5 \end{matrix} \end{matrix}) + \\ (\begin{matrix} \begin{matrix} 0 \\ 1 \end{matrix} \end{matrix}) (\begin{matrix} \begin{matrix} 0.5, & 0.5 \end{matrix} \end{matrix}) = (\begin{matrix} \begin{matrix} - 0.5 & - 0.5 \\ 0.5 & 0.5 \end{matrix} \end{matrix}) \end{matrix}

(45)

To test the classifier, we take the four training patterns: x¹, x², x³, and x⁴:

The patterns x^k are translated:

\begin{matrix} x_{t}^{1} = (\begin{matrix} \begin{matrix} - 0.5 \\ - 0.5 \end{matrix} \end{matrix}); x_{t}^{2} = (\begin{matrix} \begin{matrix} - 0.5 \\ 0.5 \end{matrix} \end{matrix}) \\ x_{t}^{3} = (\begin{matrix} \begin{matrix} 0.5 \\ - 0.5 \end{matrix} \end{matrix}); x_{t}^{4} = (\begin{matrix} \begin{matrix} 0.5 \\ 0.5 \end{matrix} \end{matrix}) \end{matrix}

(46)

The matrix M is multiplied with each of the patterns $x_{t}^{k}$ :

\begin{matrix} z^{1} = (\begin{matrix} \begin{matrix} - 0.5 & - 0.5 \\ 0.5 & 0.5 \end{matrix} \end{matrix}) (\begin{matrix} \begin{matrix} - 0.5 \\ - 0.5 \end{matrix} \end{matrix}) = (\begin{matrix} \begin{matrix} 0.5 \\ - 0.5 \end{matrix} \end{matrix}) \\ z^{2} = (\begin{matrix} \begin{matrix} - 0.5 & - 0.5 \\ 0.5 & 0.5 \end{matrix} \end{matrix}) (\begin{matrix} \begin{matrix} - 0.5 \\ 0.5 \end{matrix} \end{matrix}) = (\begin{matrix} \begin{matrix} 0 \\ 0 \end{matrix} \end{matrix}) \\ z^{3} = (\begin{matrix} \begin{matrix} - 0.5 & - 0.5 \\ 0.5 & 0.5 \end{matrix} \end{matrix}) (\begin{matrix} \begin{matrix} 0.5 \\ - 0.5 \end{matrix} \end{matrix}) = (\begin{matrix} \begin{matrix} 0 \\ 0 \end{matrix} \end{matrix}) \\ z^{4} = (\begin{matrix} \begin{matrix} - 0.5 & - 0.5 \\ 0.5 & 0.5 \end{matrix} \end{matrix}) (\begin{matrix} \begin{matrix} 0.5 \\ 0.5 \end{matrix} \end{matrix}) = (\begin{matrix} \begin{matrix} - 0.5 \\ 0.5 \end{matrix} \end{matrix}) \end{matrix}

(47)

The class vectors y^ω are computed:

\begin{matrix} y^{1} = (\begin{matrix} \begin{matrix} 1 \\ 0 \end{matrix} \end{matrix}); y^{2} = (\begin{matrix} \begin{matrix} 0 \\ 0 \end{matrix} \end{matrix}) \\ y^{3} = (\begin{matrix} \begin{matrix} 0 \\ 0 \end{matrix} \end{matrix}); y^{4} = (\begin{matrix} \begin{matrix} 0 \\ 1 \end{matrix} \end{matrix}) \end{matrix}

(48)

Thus, the vector y² in Equation 48 is not equal to vector y² in Equation 42. The same happens to vector y³ in Equation 48. That is y² ≠ (1, 0) ^t and y³ ≠ (1, 0) ^t. Therefore, patterns x² and x³ cannot be classified as they fall within the neutral region between the two classes. This demonstrates that the APC does not guarantee linear separability, even in linearly separable problems like the AND function, as shown previously in [19].

The line formed between the vectors

m¹ = (- 0.5, - 0.5) ^t and m² = (0.5, 0.5) ^t from matrix M in Equation 45 is given as:

u_{12} (x) = x_{2} - x_{1} = 0

(49)

The perpendicular line from Equation 49 is:

v_{12} (x) = x_{2} + x_{1} = 0

(50)

The hyperplane from Equation 50 is shifted with respect to the vector $\bar{x}$ , resulting in the decision line:

h_{12} (x) = x_{2} + x_{1} - 1 = 0

(51)

The decision line of Equation 51 is shown in Fig. 5. It is observed that the vectors x² = (0, 1) ^t and x³ = (1, 0) ^t that solve the equation give y² = (0, 0) ^t and y³ = (0, 0) ^t, respectively. This happens because the equation forms a neutral region, as mentioned in Subsection 3.4.

Fig. 5

Decision line generated by the APC to solve the AND function.

6 Formal demonstration of the APC as a static system

This section demonstrates theoretically that the APC forms a system in static equilibrium and that the translation vector cannot be classified by the APC.

Definition 1. The APC maps a set of input patterns x^k to a set of output patterns y^k in such a way that each pattern x^k is associated with a pattern y^k where k is an index representing a specific pair of associated patterns.

Definition 2. A fundamental set of associations is defined as S ={ (x^k, y^k) |K = 1, 2, . . . , p } where p is the cardinality of the fundamental set.

Definition 3. An arbitrary input pattern is defined as x^k ∈ Rⁿ where n is the dimension of x^k and k is an index representing a specific pair of associated patterns.

Definition 4. A vector class to which an input pattern x^k is associated is defined as y^k ∈ { 0, 1 } ^q where q is the dimension of y^k and k is an index representing a specific pair of associated patterns.

Theorem 1. Let M represent an APC, for i∈ { 1, 2, . . . , q }, consider the weight vectors $m^{i} = {(\begin{matrix} m_{i 1}, & m_{i 2}, & \dots & m_{in} \end{matrix})}^{t}$ derived from the matrix M. Additionally, let S be the set { (x^k, y^k) |K = 1, 2, . . . , p}, which constitutes the fundamental set of M. Define $\bar{x}$ as the translation vector, x^k as an arbitrary pattern, and $x_{t}^{k}$ as the translation of x^k. Then, it holds that $\sum_{i = 1}^{q} m^{i} = 0$ .

Proof. By inserting Equation 3 into Equation 4, we have:

x_{t}^{k} = x^{k} - \frac{1}{p} \sum_{K = 1}^{p} x^{k}

(52)

When we apply Equation 52 to translate all input patterns x^k in the fundamental set, we obtain the following:

\begin{matrix} x_{t}^{1} = x^{1} - \frac{1}{p} (x^{1} + x^{2} + x^{3} + \dots + x^{p - 1} + x^{p}) \\ x_{t}^{2} = x^{2} - \frac{1}{p} (x^{1} + x^{2} + x^{3} + \dots + x^{p - 1} + x^{p}) \\ x_{t}^{3} = x^{3} - \frac{1}{p} (x^{1} + x^{2} + x^{3} + \dots + x^{p - 1} + x^{p}) \\ ⋮ \\ x_{t}^{p - 1} = x^{p - 1} - \frac{1}{p} (x^{1} + x^{2} + x^{3} + \dots + x^{p - 1} + x^{p}) \end{matrix} \begin{matrix} x_{t}^{p} = x^{p} - \frac{1}{p} (x^{1} + x^{2} + x^{3} + \dots + x^{p - 1} + x^{p}) \end{matrix}

(53)

When we extend Equation 5, it becomes:

M = y^{1} {(x_{t}^{1})}^{t} + y^{2} {(x_{t}^{2})}^{t} + y^{3} {(x_{t}^{3})}^{t} + \dots + y^{p} {(x_{t}^{p})}^{t}

(54)

If we assume that each vector x^k is associated with a distinct class y^k, we can define q = p classes. Therefore, classc∈ { 1, 2, . . . , p } is defined as follows:

y_{j}^{k} = {\begin{matrix} 1 & for : j = c \\ 0 & for : j = 1, 2, . . ., c - 1, c + 1, . . ., p \end{matrix}

(55)

Thus, Equation 54 becomes:

M = (\begin{matrix} \begin{matrix} {(x_{t}^{1})}^{t} \\ {(x_{t}^{2})}^{t} \\ {(x_{t}^{3})}^{t} \\ ⋮ \\ {(x_{t}^{p - 1})}^{t} \\ {(x_{t}^{p})}^{t} \end{matrix} \end{matrix})

(56)

where, without loss of generality, each translated pattern

{(x_{t}^{k})}^{t}

corresponds to a row vector mⁱ. Thus, the memory M in Equation 56 takes the following form:

M = (\begin{matrix} \begin{matrix} m^{1} = {(x_{t}^{1})}^{t} \\ m^{2} = {(x_{t}^{2})}^{t} \\ m^{3} = {(x_{t}^{3})}^{t} \\ ⋮ \\ m^{p - 1} = {(x_{t}^{p - 1})}^{t} \\ m^{p} = {(x_{t}^{p})}^{t} \end{matrix} \end{matrix})

(57)

Hence, based on Equation 57, we can express the summation as follows:

\sum_{i = 1}^{q} m^{i} = \sum_{i = 1}^{q} {(x_{t}^{q})}^{t}; q = p

(58)

If we sum the translated patterns in Equation 53 as $\sum_{i = 1}^{p} {(x_{t}^{p})}^{t} = {(x_{t}^{1})}^{t} + {(x_{t}^{2})}^{t} + {(x_{t}^{3})}^{t} + \dots + {(x_{t}^{p})}^{t}$ , we have:

\begin{matrix} \sum_{i = 1}^{p} {(x_{t}^{p})}^{t} = (x^{1} + x^{2} + x^{3} + \dots + x^{p - 1} + x^{p}) \\ - \frac{p}{p} (x^{1} + x^{2} + x^{3} + \dots + x^{p - 1} + x^{p}) = 0 \end{matrix}

(59)

When examining the matrix M as described in Equation 57, it becomes evident that regardless of the class y^k associated with an input patternx^k, the translation pattern

x_{t}^{k}

for x^k is always summed in the corresponding row of M. In other words, the translated patterns are consistently summed as shown in Equation 59, irrespective of the row in which they contribute. This conclusively proves the theorem □.

Theorem 2. Let M represent an APC. Define $\bar{x}$ as the translation vector, x^ω as an arbitrary input pattern, $x_{t}^{ω}$ as the translation of x^k, z^ω the multiplication pattern of the APC, and y^ω the class vector. Then, it holds that if $x^{ω} = \bar{x}$ , then y^ω = 0 .

Proof. Using Equation 6, we express $x_{t}^{ω} = x^{ω} - \bar{x}$ . If $x^{ω} = \bar{x}$ , then $x_{t}^{ω} = 0$ . Employing Equation 7, we derive $z^{ω} = M x_{t}^{ω} = 0$ for $x_{t}^{ω} = 0$ . Applying Equation 8, we find y^ω = 0 since it is impossible to compute $y_{j}^{ω} = 1$ for any component j of the vector y^ω, as there is no maximum component value in the vector z^ω = 0 □.

Consequently, the translation vector $\bar{x}$ cannot be classified by the APC. This is because the position j does not contain $y_{j}^{ω} = 1$ in the vector y^ω to assign a class index.

7 Experimental results

In this section, we outline the experiments conducted to test the performance of the APC. A comparison was conducted between the APC and four state-of-the-art pattern classifiers. These classifiers are described in Subsection 7.1. Twenty real-world and two artificial databases were used. Real-world databases are described in Subsection 7.2 and the artificial database construction is described in Subsection 7.3. Subsection 7.4 gives the experiment design and Subsection 7.5 shows the results.

7.1 Pattern classifiers

Pattern classifiers were obtained from the machine learning algorithm collection Weka [27]. Here is a general description of the pattern classifiers used in the experimentation.

–
Minimal Distance Classifier (MDC): This algorithm determines the class to which a given pattern belongs based on its proximity to a representative pattern of each class [28–30].
–
Naive Bayes (NB): This algorithm utilizes Bayes’ theories. It assumes that the presence (or absence) of a particular feature of a class is not related to the presence (or absence) of any other feature, taking into account the class variable [29, 31, 32].
–
K-Nearest Neighbors (KNN): It is a type of nearest neighbour classification, where a sample from each of the classes to which an unknown element can be assigned is taken [33]. It is called the nearest neighbour because the feature vector with the shortest distance compared to other feature vectors in the sample space determines the class to which the input vector will be assigned.
–
C4.5: This algorithm generates a decision tree for classification, building on the predecessor algorithm ID3 [29].

7.2 Real-world databases

A total of twenty databases with different distributions were utilized. All of them were extracted from the University of California Irvine (UCI) machine learning repository [34], except for the “Objects” database, which was created by H. Sossa from the Computer Research Center at the National Polytechnic Institute, Mexico, describing a set of five objects (a bolt, a washer, an eyebolt, a hook, and a dovetail) using the first two Hu invariant moments [35, 36]. Table 2 provides a summary of the number of instances, attributes, and classes for each of these databases.

Table 2
Summary of the databases used for experimentation

Database Patterns Attributes Classes

1. Breast Cancer Wisconsin 699 10 2

2. Pima Indians Diabetes 768 9 2

3. Haberman’s Survival 306 4 2

4. Heart Disease 270 14 2

5. Hepatitis 155 20 2

6. Ionosphere 351 35 2

7. Chess 3196 37 2

8. Liver disorders 345 7 2

9. Connectionist Bench 208 61 2

10. Balance Scale 625 5 3

11. Iris 150 5 3

12. Waveform Database Generator 5000 41 3

13. Wine 178 14 3

14. Abalone 4177 9 29

15. Glass Identification 214 10 7

16. Thyroid disease 3772 30 4

17. Letter Recognition 20000 17 26

18. Objects 100 2 5

19. Image Segmentation 2310 20 7

20. Statlog 946 19 4

Database	Patterns	Attributes	Classes
1. Breast Cancer Wisconsin	699	10	2
2. Pima Indians Diabetes	768	9	2
3. Haberman’s Survival	306	4	2
4. Heart Disease	270	14	2
5. Hepatitis	155	20	2
6. Ionosphere	351	35	2
7. Chess	3196	37	2
8. Liver disorders	345	7	2
9. Connectionist Bench	208	61	2
10. Balance Scale	625	5	3
11. Iris	150	5	3
12. Waveform Database Generator	5000	41	3
13. Wine	178	14	3
14. Abalone	4177	9	29
15. Glass Identification	214	10	7
16. Thyroid disease	3772	30	4
17. Letter Recognition	20000	17	26
18. Objects	100	2	5
19. Image Segmentation	2310	20	7
20. Statlog	946	19	4

7.3 Artificial database construction

Two artificial databases were generated using MATLAB software. These databases were randomly created using a multivariate normal distribution with a mean vector and covariance matrix for each class. The first data set consists of 400 instances of two dimensions. These instances are non-overlapping and are distributed across four classes, with each class containing 100 instances. The class distribution is shown in Fig. 6, and the MATLAB code used is displayed in Listing 1. The second data set consists of 800 instances of dimension two. These instances are distributed across eight classes, each containing 100 instances. The class distribution, as shown in Fig. 7, reveals some overlapping between these classes. The MATLAB code used to generate this data set can be found in Listing 2.

Fig. 6

Balanced distribution of four classes of non-overlapping patterns.

Fig. 7

A balanced distribution of eight pattern classes with overlap.

7.4 Experiment design

Each classifier was assessed using the accuracy metric. This performance metric was computed through a confusion matrix, which contained information about the correct classifications and predictions made by the algorithm. Accuracy is the ratio of the total number of correct predictions. A stratified 10-fold cross-validation was employed for the training and testing experiments. Each test was repeated ten times, from which an average classification performance was obtained.

7.5 Results

Table 3 provides a summary of the experimental results. The superior performances are indicated in bold. The average performance of APC is lower compared to other classifiers. In contrast, KNN (K = 1) achieves the highest average performance. Despite the APC showing minimal classification performance, it excels in performance for the Breast Cancer Wisconsin dataset and the two artificial databases. This is because the pattern distribution in the databases meets the conditions under which the APC operates. That is, the patterns are distributed in a balanced manner.

Table 3
Summary of the classification accuracy of APC, MDC, NB, KNN (K = 1), and C4.5 algorithms, obtained through k-Folds and 10 repetitions

Database APC MDC NB KNN (K = 1) C4.5 Average

1. Breast Cancer Wisconsin 97.08±0.04 96.12±0.01 96.07±0.26 95.28±0.45 95.01±0.53 95.91

2. Pima Indians Diabetes 63.26±0.17 63.42±0.12 75.75±1.06 70.62±1.03 74.49±0.87 69.51

3. Haberman’s Survival 58.36±1.11 61.97±0.81 75.36±1.25 67.50±1.60 72.16±1.00 67.07

4. Heart Disease 62.64±0.28 62.80±0.25 83.59±1.59 76.15±2.15 78.15±1.35 72.67

5. Hepatitis 52.59±0.37 62.34±0.25 83.81±1.53 81.40±1.47 79.22±1.79 71.87

6. Ionosphere 70.35±0.48 66.81±0.24 82.17±0.90 87.10±0.80 89.74±1.09 79.23

7. Chess 80.71±0.14 80.57±0.13 87.79±0.25 96.12±0.19 99.44±0.08 88.93

8. Liver disorders 54.86±0.19 55.66±0.19 54.89±1.90 62.22±1.71 65.84±1.66 58.69

9. Connectionist Bench 60.60±0.27 61.26±0.28 67.71±1.12 86.17±1.24 73.61±1.36 69.87

10. Balance Scale 84.23±0.27 69.78±0.31 90.53±0.27 86.72±0.44 77.82±0.74 81.82

11. Iris 69.24±0.03 91.53±0.06 95.53±0.79 95.40±1.07 94.73±0.97 89.29

12. Waveform Database Generator 73.98±0.01 79.38±0.01 80.01±0.31 73.41±0.42 75.25±0.30 76.41

13. Wine 69.63±0.10 68.74±0.29 97.46±0.64 95.12±0.48 93.20±0.88 84.83

14. Abalone 17.83±0.10 7.27±0.08 23.86±0.33 19.97±0.32 20.99±0.33 17.98

15. Glass Identification 51.66±0.28 43.55±0.58 49.45±2.29 69.95±1.17 67.63±2.11 56.45

16. Thyroid disease 55.75±0.04 74.32±0.10 95.30±0.15 91.52±0.24 99.54±0.10 83.29

17. Letter Recognition 43.03±0.02 53.67±0.02 64.07±0.19 96.01±0.05 88.03±0.20 68.96

18. Objects 39.63±0.04 55.99±0.16 100.00±0.00 100.00±0.00 95.90±0.98 78.30

19. Image Segmentation 58.88±0.06 71.47±0.06 80.17±0.49 97.15±0.18 96.79±0.19 80.89

20. Statlog 38.03±0.05 38.30±0.06 44.68±1.02 69.59±0.68 72.28±0.94 52.58

21. Artificial four classes 100.00±0.00 99.28±0.02 99.75±0.46 99.90±0.30 99.43±0.82 99.67

22. Artificial eight classes 83.20±0.00 81.58±0.06 83.08±2.29 76.60±2.70 80.18±2.84 80.93

Average 62.98 65.72 77.77 81.54 81.34 73.87

Database	APC	MDC	NB	KNN (K = 1)	C4.5	Average
1. Breast Cancer Wisconsin	97.08±0.04	96.12±0.01	96.07±0.26	95.28±0.45	95.01±0.53	95.91
2. Pima Indians Diabetes	63.26±0.17	63.42±0.12	75.75±1.06	70.62±1.03	74.49±0.87	69.51
3. Haberman’s Survival	58.36±1.11	61.97±0.81	75.36±1.25	67.50±1.60	72.16±1.00	67.07
4. Heart Disease	62.64±0.28	62.80±0.25	83.59±1.59	76.15±2.15	78.15±1.35	72.67
5. Hepatitis	52.59±0.37	62.34±0.25	83.81±1.53	81.40±1.47	79.22±1.79	71.87
6. Ionosphere	70.35±0.48	66.81±0.24	82.17±0.90	87.10±0.80	89.74±1.09	79.23
7. Chess	80.71±0.14	80.57±0.13	87.79±0.25	96.12±0.19	99.44±0.08	88.93
8. Liver disorders	54.86±0.19	55.66±0.19	54.89±1.90	62.22±1.71	65.84±1.66	58.69
9. Connectionist Bench	60.60±0.27	61.26±0.28	67.71±1.12	86.17±1.24	73.61±1.36	69.87
10. Balance Scale	84.23±0.27	69.78±0.31	90.53±0.27	86.72±0.44	77.82±0.74	81.82
11. Iris	69.24±0.03	91.53±0.06	95.53±0.79	95.40±1.07	94.73±0.97	89.29
12. Waveform Database Generator	73.98±0.01	79.38±0.01	80.01±0.31	73.41±0.42	75.25±0.30	76.41
13. Wine	69.63±0.10	68.74±0.29	97.46±0.64	95.12±0.48	93.20±0.88	84.83
14. Abalone	17.83±0.10	7.27±0.08	23.86±0.33	19.97±0.32	20.99±0.33	17.98
15. Glass Identification	51.66±0.28	43.55±0.58	49.45±2.29	69.95±1.17	67.63±2.11	56.45
16. Thyroid disease	55.75±0.04	74.32±0.10	95.30±0.15	91.52±0.24	99.54±0.10	83.29
17. Letter Recognition	43.03±0.02	53.67±0.02	64.07±0.19	96.01±0.05	88.03±0.20	68.96
18. Objects	39.63±0.04	55.99±0.16	100.00±0.00	100.00±0.00	95.90±0.98	78.30
19. Image Segmentation	58.88±0.06	71.47±0.06	80.17±0.49	97.15±0.18	96.79±0.19	80.89
20. Statlog	38.03±0.05	38.30±0.06	44.68±1.02	69.59±0.68	72.28±0.94	52.58
21. Artificial four classes	100.00±0.00	99.28±0.02	99.75±0.46	99.90±0.30	99.43±0.82	99.67
22. Artificial eight classes	83.20±0.00	81.58±0.06	83.08±2.29	76.60±2.70	80.18±2.84	80.93
Average	62.98	65.72	77.77	81.54	81.34	73.87

8 Conclusions

It can be stated that the APC is an AM that undergoes a learning process in a single iteration. This enables the APC to create a memory straightforwardly and quickly, based on a set of patterns associated with their respective class vectors. The construction of this memory takes place once the patterns are shifted about a translation vector, giving rise to a coordinate translation plane.

We demonstrate theoretically that, in the translation plane, the rows comprising the memory are concurrent vectors that form a system in static equilibrium, allowing the APC to distribute the various class regions in a balanced manner. While the APC achieves a proper separation of a set of evenly distributed regions on the plane, it cannot achieve an absolute separation, as there exists a neutral region that the APC cannot classify. The neutral region is defined by all the points that define the separation hyperplanes. However, the APC is a multiclass and a generalized non-linear classifier, enabling the APC to be efficient and swift in separating a set of evenly distributed classes. We also found that the APC is unable to classify the translation vector, it cannot be classified as it falls within the neutral region. Notably, the classifier exhibits noise tolerance; the algorithm creates decision regions where even more distorted versions of a given pattern can be classified, as long as they do not fall into the neutral region generated by the APC. It was found that the APC is incapable of separating the AND function due to the neutral region forming the separation line, which traverses two of the four points of the AND function. When the distribution of patterns aligns with the regions defined by the APC, it surpasses certain classifier algorithms like MDC, NB, KNN, and C4.5. The APC demonstrates reduced performance when evaluated with real-world databases, rendering it inaccurate due to its limited internal operations.

Footnotes

Acknowledgments

S. Valadez-Godínez expresses gratitude to the Universidad Politécnica de Pénjamo, CONAHCYT, and the Centro de Investigación en Computación of the Instituto Politécnico Nacional for their financial support in carrying out this research. H. Sossa want to thank to Instituto Politécnico Nacional (grants SIP 20220226, 20231622, and 20240956) for their financial support during the research. R. Santiago-Montero wishes to convey thanks to the Instituto Tecnológico de León and CONAHCYT for their assistance and support.

References

Zurada

J.M.

, Introduction to Artificial Neural Systems,West Publishing Company, 1992.

Palm

, Neural associative memories and sparse coding, Neural Networks37 (2013), 165–171. doi: 10.1016/j.neunet.2012.08.013.

Prasad

B.D.C.N.

Prasad

P.E.S.N.K.

Sagar

Murty

P.S.R.

, A Study on Associative Neural Memories, International Journal of Advanced Computer Sciences and Applications1(6) (2011), 124–133. doi: 10.14569/IJACSA.2010.010619.

Steinbuch

, Die Lernmatrix, Kybernetik, 1 (1961), 36–45. doi: 10.1007/BF00293853.

Steinbuch

Frank

, Kybernetics1(3) (1963), 117–124. doi: 10.1007/BF00290182.

Anderson

J.A.

, A simple neural network generating an interactive memory, Mathematical Biosciences14(3-4) (1972), 197–220. doi: 10.1016/0025-5564(72)90075-2.

Kohonen

, Correlation Matrix Memories, IEEE Transactions on ComputersC-21(4) (1972), 353–359. doi: 10.1109/TC.1972.5008975.

Nakano

, Associatron-A model for associative memory, IEEE Transactions on Systems, Man, and CyberneticsSMC-2(3) (1972), 380–388. doi: 10.1109/TSMC.1972.4309133.

Sánchez-Garfias

F.A.

Díaz-de-León-S

J.L.

, and C.Yañez-Márquez, Lernmatrix: Condiciones necesarias ysuficientes para recuperación perfecta, Memoria del CIARP (2002), 437–448.

10.

Sánchez-Garfias

F.A.

Díaz-de-León-S.

J.L.

Yañez-Márquez

, Lernmatrix de Steinbuch: Avances Teoricos, Computacion y Sistemas7(3) (2004), 175–189.

11.

Sánchez-Garfias

F.A.

J.L.D.-d.-L.S., Yáñez-Márquez,

, A new theoretical framework for the Steinbuch’s Lernmatrix, in: SPIE Proceedings, J.T. Astola, I. Tabus and J. Barrera, eds, SPIE, 2005. doi:10.1117/12.621551.

12.

Hassoun

M.H.

, Fundamentals of Artificial Neural Networks, 1st edn, MIT Press, Cambridge, MA, USA, 1995.

13.

Rosenfeld

Anderson

J.A.

, Neurocomputing : foundations of research / edited by James A. Anderson and Edward Rosenfeld, Cambridge, Mass: MIT Press, 1988.

14.

Hassoun

M.H.

, Associative Neural Memories: Theory and Implementation, New York: Oxford University Press, 1993.

15.

Ritter

G.X.

Sussner

Dıaz-De-León

J.L.

, Morphological associative memories, IEEE Transactions on Neural Networks9(2) (1998), 281–293. doi: 10.1109/72.661123.

16.

Santiago-Montero

Yañez-Márquez

Dıaz-de-León

J.L.

, Clasificador Asociativo de Patrones: Avances Teoricos,n. , Avances en: Ciencias de la Computación Special Edition, Research on Computing Science Series. Centro de Investigación en Computación, IPN, México3 (2003), 257–267.

17.

Soria-Alcaraz

J.A.

Santiago-Montero

Carpio

, One criterion for the selection of the cardinality of learning set used by the Associative Pattern Classifier, in: 2010 IEEE Electronics, Robotics and Automotive Mechanics Conference, 2010, pp. 80–84. doi: 10.1109/CERMA.2010.20.

18.

Santiago-Montero

Sossa

Gutiérrez-Hernández

D.A.

Zamudio

Hernández-Bautista

Valadez-Godínez

, Novel Mathematical Model of Breast Cancer Diagnostics Using an Associative Pattern Classification, Diagnostics, 10(3) (2020), 136–10.3390/diagnostics10030136..

19.

Santiago-Montero

Sergio

Sossa

Gutiérrez-Hernández

D.A.

Ornerlas-Rodrıguez

, A study of the associative pattern classifier method for multi-class processes, Journal of Optoelectronics and Advanced Materials17(5-6) (2015), 713–719.

20.

Uriarte-Arcia

A.V.

López-Yáñez

Yáñez-Márquez

, One-Hot Vector Hybrid Associative Classifier for Medical Data Classification, PLoS ONE, 9(4) (2014), e95715. doi: 10.1371/journal.pone.0095715.

21.

Cleofas-Sánchez

Garcıa

Martın-Félez

Valdovinos

R.M.

Sánchez

J.S.

Camacho-Nieto

, Hybrid Associative Memories for Imbalanced Data Classification: An Experimental Study, in: Lecture Notes in Computer Science, 2013, pp. 325–334. doi: 10.1007/978-3-642-38989-4.

22.

Aldape-Pérez

Yañez-Márquez

Camacho-Nieto

Arguelles-Cruz

A.J.

, An associative memory approach to medical decision support systems, Computer Methods and Programs in Biomedicine106(3) (2011), 287–307. doi: 10.1016/j.cmpb.2011.05.002.

23.

Padierna-Garcıa

L.C.

, Clasificador Asociativo de Patrones DifusoAplicado al Diagnostico de Diabetes Mellitus, Master’s thesis,Instituto Tecnologico de León, 2011.

24.

Padierna-Garcıa

L.C.

Santiago-Montero

Zamarrón-Ramirez

, Pattern Associative Classifier Applied to Diabetes Mellitus Diagnosis, Memorias del 3er Congreso Internacional en Ciencias Computacionales CiComp2010 (2010), 53–57.

25.

Palm

Schwenker

Sommer

F.T.

Strey

, Neural associative memories, Biological Cybernetics36 (1993), 36–19.

26.

Bueche

F.J.

, Principles of physics, 6th edn, McGraw-Hill, 1995.

27.

Hall

Frank

Holmes

Pfahringer

Reutemann

Witten

I.H.

, The WEKA data mining software, SIGKDD Explor. Newsl.11(1) (2009), 10–18. doi: 10.1145/1656274.1656278.

28.

Friedman

Kandel

, Introduction to Pattern Recognition: Statistical, Structural, Neural and Fuzzy Logic Approaches, Series in Machine Perception and Artificial Intelligence, World Scientific Publishing Company, 1999. doi:10.1142/3641.

29.

Duda

R.O.

Hart

P.E.

Stork

D.G.

, Pattern classification, 2nd Edition, Wiley Interscience, New York, 2001.

30.

Marques de Sa

J.P.

, Pattern Recognition, Concepts, Methods, and Applications, Springer-Verlag, 2002. doi:10.1007/978-3-642-56651-6.

31.

Rish

, An empirical study of the naive Bayes classifier, in: IJCAI-01 workshop on Empirical Methods in AI, 2001.

32.

Zhang

, The Optimality of Naive Bayes, in: Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2004), V. Barr and Z. Markov, eds, AAAI Press, 2004.

33.

Patrick

E.A.

Fischer-III

F.P.

, A generalized k-nearest neighbor rule, Information and Control16(2) (1970), 128–152. doi: 10.1016/S0019-9958(70)90081-1.

34.

Frank

Asuncion

, UCI Machine Learning Repository, 2010. http://archive.ics.uci.edu/ml.

35.

Sossa

Barrón

Cuevas

Aguilar

Cortés

, Binary Associative Memories Applied to Gray Level Pattern Recalling, in: Lecture Notes in Computer Science. Advances in Artificial Intelligence IBERAMIA 2004, Springer Berlin Heidelberg, pp. 656–666. doi:10.1007/978-3-540-30498-2_66.

36.

Sossa

Barrón

Vázquez

R.A.

, Transforming Fundamental Set of Patterns to a Canonical Form to Improve Pattern Recall, in: Lecture Notes in Computer Science. Advances in Artificial Intelligence IBERAMIA 2004, Springer Berlin Heidelberg, pp. 687–696. doi:10.1007/978-3-540-30498-2_69.

The associative pattern classifier: Progress in theoretical understanding

Abstract

Keywords

1 Introduction

2 The associative pattern classifier

3.1 Concurrent vector system

Table 1 AND function x 1 x 2 Class 0 0 0 0 1 0 1 0 0 1 1 1

7.1 Pattern classifiers

7.5 Results

Footnotes

Acknowledgments

References

Table 1
AND function

x ₁ x ₂ Class

0 0 0

0 1 0

1 0 0

1 1 1