On the learning machine with quaternionic domain neural network and its high-dimensional applications

Abstract

There are various high-dimensional engineering and scientific applications in communication, control, robotics, computer vision, biometrics, etc.; where researchers are facing predicament to fabricate an intelligent and robust neural system which can process higher dimensional information efficiently. In various literatures, the conventional neural networks based only on real valued, are tried to solve the problem associated with high-dimensional parameters, but these neural network structures possess high complexity and are very time consuming and weak to noise. These networks are also not able to learn magnitude and phase values simultaneously in space. The quaternion is the number, which possesses the magnitude in all four directions and phase information is embedded within it. This paper presents a learning machine with a quaternionic domain neural network that can finely process magnitude and phase information of high dimension data without any hassle. The learning and generalization capability of the proposed learning machine is performed through chaotic time series predictions (Lorenz system and Chua’s circuit), 3D linear transformations, and 3D face recognition as benchmark problems, which demonstrate the significance of the work.

Keywords

Quaternion quaternionic domain neural network 3D motion 3D imaging chaotic time series prediction

1. Introduction

Machine learning technique mainly concentrates to improve the intelligent activities of the system governed through experience. In the recent scenario, it is one of the most emerging technical fields that build a bridge between artificial intelligence and data science. Currently, development of new learning algorithms and theory are challenging one for online or high-dimensional data with low computational cost [26]. In addition, the high-dimensional information processing through neural network is also emerging as a fascinating but challenging field for second generation neurocomputing researchers. The recent researches in high-dimensional neural networks have established their superiority over first generation as real-valued neural networks (RVNN) as addressed in [1 , 8]. Although, RVNN have been used to process high-dimensional data, but the network needs to utilize too many real valued neurons hence the resulting network is become huge in structure and slow learning. For the complex network structure, reliability of the network has an important element but, but reliability calculation is a NP-hard problem, therefore simulation approach is feasible to assessing network reliability [24]. However, some recent advances in theapplication of RVNN are based on the modeling of active devices like a transistor [23]. For advancement of learning algorithm, the instance selection algorithm based on cross-validation [25] and MapReduce and voting mechanism (MRVIS) [27] are used only for large data sets and compared it with some classical algorithms in terms of learning speed and selected ratio. The RVNN can also not process phase information during learning and generalization of mapping on the plane [3 , 7]. The complex-valued neural networks (CVNN) with nonparametric activation functions [28] can promptly process two-dimensional information with phase as a single number, which leads to a drastic reduction in the complexity of the network along with better performance. But, the neural network of three-dimensional information still needs an exhaustive investigation. The applications with three-dimensional information are popular in computer vision, robotics, biometrics, bioinformatics etc. The few researchers attempted machine learning with three-dimensional information considering it as a vector [8, 9]. The corresponding learning algorithms have restrictions on weight matrix and a vector does not provide freedom like a complex number, as in CVNN [9]. Thus, it is very demanding to have neural network, which may promptly process different high-dimensional parameters as numbers and can be simply incorporated in various applications of intelligent machine design, like CVNN [2 , 3–6]. In the enhancement of higher-order number systems, the complex numbers (2D), quaternions (4D), octaves (8D), sedenions (16D) were developed by mathematicians in the past but there is no number system in three dimensions [10]. The researches [1–3 , 7] also elaborate that the CVNN has outperformed over RVNN even for real-valued problems, therefore we propose to exploit quaternions in neural network to process three and four dimensional problems.

The neurocomputing with high-dimensional number systems will definitely overcome from learning and generalization of huge conventional neural network and lead to lower complexity. The quaternion is the hyper-complex number initially introduced by Iris mathematician Hamilton [11]. It has been extensively employed in the field of quantum mathematics, physics, computer graphics, signal processing and control [12 , 17]. This number system has recently popped up in neural network through quaternionic neurons, as complex or real-valued neurons, to develop efficient machine learning in higher dimensions. Few attempts have been made in this direction, the orthogonal decision boundary of single quaternionic neuron has been utilized to solve 4-bit parity problem in [18]; quaternionic MLPs proposed in [15] has the problem of existence of singularities; quaternion-valued algorithms are proposed for adaptive filtering [16, 17]; a basic work for quaternionic-valued neural network with sigmoid activation function is presented in [14, 19]. In this paper, we present not only simple, straightforward, but potential machine learning algorithm for sufficient general structure of the quaternionic domain neural network (QDNN) but also demonstrate the evaluation over the wide spectrum of applications, like function approximation, motion interpretation and recognition in space. The parameters in QDNN, like synaptic weights, biases, inputs-outputs signals and internal potentials are quaternions and represented as quaternion matrix, in multi-layered neural network. Although, Hamilton proposed quaternionic numbers ( q = q₀ + q₁i + q₂j + q₃ k ) for 4D number system [11], but it can also bring into play any 3D information in the space after equating its real part zero. The presented learning algorithm based on the error-backpropagation for QDNN can efficiently solve any typical class of problems in 3D and 4D. The analytic [1, 9] or split type [1 , 9] activation functions have been chosen for complex-valued neuron which have their own issues concerning boundedness and analyticity. Therefore, the selection of suitable activation function for neuron dealing with quaternion is one of the important concerns. The split type function may not be appropriate when analyticity is concerned; similarly the analytic function is not suitable when the singularity arises. The presented QDNN prefers boundednes over analyticity and use “split-type” activation function. The QDNN outperforms with lesser number of neurons and faster learning where conventional RVNN lacks. The QDNN has an ability to learn and generalize 3D motion of objects and recognition of the point cloud object, but RVNN cannot, because QDNN has ability to capture and maintain phase information of each point during the learning and generalization.

This paper investigates the general structure of QDNN with learning algorithm through simulation on various benchmark problems of different sphere of influence. The rest part of this paper is organized as follows: Section 2 presents a complete machine learning framework with pseudo code of learning in quaternionic domain. Section 3 evaluates the learning and generalization capability through function approximations, linear transformations and 3D face recognition. Final conclusion and future scope of the work are presented in Section 4.

2. Machine learning in quaternionic domain

A quaternionic number system is the straightforward extension of real and complex number system, where four components are incorporated in single number; the first component acts as real and other three as imaginary with unit vectors ( i , j , k ). These imaginary components overlie on the axes in three-dimensional space [11, 12]. A quaternionic variable ( q = q₀ + q₁ i + q₂ j + q₃ k ) consists of a real component (q₀) and three imaginary components (q₁, q₂, q₃). Its bases ( i , j , k ) are orthogonal vectors that follow complex number properties as i ² = j ² = k ² = -1 and cross product properties as ij = - ji = k , jk = - kj = i , ki = - ik = j . In a prominent representation, a quaternion (q) can be expressed in the form of a matrix (quaternionic matrix): $q = [\begin{matrix} q_{0} & q_{1} & q_{2} & q_{3} \\ - q_{1} & q_{0} & - q_{3} & q_{2} \\ - q_{2} & q_{3} & q_{0} & - q_{1} \\ - q_{3} & - q_{2} & q_{1} & q_{0} \end{matrix}] .$ (1)

All bold-type letters denote either quaternionic variable or quaternionic matrix. The conjugate of quaternionic variable ( q ^* = q₀ - q₁ i - q₂ j - q₃ k ) is similar to complex conjugate and the conjugate of quaternionic matrix denotes the transpose of quaternionic matrix, defined as: $q^{*} = q^{T} = [\begin{matrix} q_{0} & - q_{1} & - q_{2} & - q_{3} \\ q_{1} & q_{0} & q_{3} & - q_{2} \\ q_{2} & - q_{3} & q_{0} & q_{1} \\ q_{3} & q_{2} & - q_{1} & q_{0} \end{matrix}] .$ (2)

The learning algorithm incorporates the basic operations of quaternion algebra [11, 12]. The addition and subtraction of two quaternionic matrices q and r cab be obtained simply as matrix operations. The multiplication of two quaternionic matrices q and r does not follow the commutative property ( qr ≠ rq ). The inner product (element-wise multiplication) of two quaternionic matrices q and r is denoted by ⊙ symbol and expressed as follows: $\begin{matrix} q ⊙ r & = & [\begin{matrix} q_{0} & q_{1} & q_{2} & q_{3} \\ - q_{1} & q_{0} & - q_{3} & q_{2} \\ - q_{2} & q_{3} & q_{0} & - q_{1} \\ - q_{3} & - q_{2} & q_{1} & q_{0} \end{matrix}] ⊙ [\begin{matrix} r_{0} & r_{1} & r_{2} & r_{3} \\ - r_{1} & r_{0} & - r_{3} & r_{2} \\ - r_{2} & r_{3} & r_{0} & - r_{1} \\ - r_{3} & - r_{2} & r_{1} & r_{0} \end{matrix}] \\ = & [\begin{matrix} q_{0} r_{0} & q_{1} r_{1} & q_{2} r_{2} & q_{3} r_{3} \\ q_{1} r_{1} & q_{0} r_{0} & q_{3} r_{3} & q_{2} r_{2} \\ q_{2} r_{2} & q_{3} r_{3} & q_{0} r_{0} & q_{1} r_{1} \\ q_{3} r_{3} & q_{2} r_{2} & q_{1} r_{1} & q_{0} r_{0} \end{matrix}] . \end{matrix}$ (3)

The norm of quaternionic matrix q is expressed as: $| | q | | = \sqrt{q_{0}^{2} + q_{1}^{2} + q_{2}^{2} + q_{3}^{2}} .$ (4)

2.1. Learning in quaternionic domain neural networks

A three-layered (L - M - N) quaternionic domain neural network (QDNN) possesses L inputs; M and N quaternionic neurons in hidden and output layers respectively. All inputs, outputs, weights and biases signals are considered as quaternionic matrices, as represented in Equation (1). The derivation of optimization technique incorporates the basic operations of quaternion algebra which present the compact and the generalized derivation of the backpropagation algorithm (QDBP) of three-layered network.

2.1.1 Forward pass

Let us consider $I_{l}^{r}$ , $I_{l}^{x}$ , $I_{l}^{y}$ , $I_{l}^{z}$ be the 4D quaternionic input of ith (l = 1 … L) neuron in the input layer of the network. The quaternionic input can be expressed as a quaternionic matrix ( I _l): $I_{l} = [\begin{matrix} I_{l}^{r} I_{l}^{x} I_{l}^{y} I_{l}^{z} \\ - I_{l}^{x} I_{l}^{r} - I_{l}^{z} I_{l}^{y} \\ - I_{l}^{y} I_{l}^{z} I_{l}^{r} - I_{l}^{x} \\ - I_{l}^{z} - I_{l}^{y} I_{l}^{x} I_{l}^{r} \end{matrix}] .$ (5)

The matrix of inputs (I) at the input layer of the network is defined by I = [I₁ I₂ I₃ … I_L] ^T. The initialization of synaptic connection weights w _ml and s _nm are defined for lth input to mth (m = 1 … M) hidden neuron pair and for mth hidden to nth (n = 1 … N) output neuron pair of the network, respectively. The weights ( w _ml and s _nm) and biases ( α _m and β _n) are presented in quaternionic matrices containing a real and other three imaginary components.

The internal potential matrix ( U ) for neurons (1 … M) at hidden layer of the network is defined as:

$\begin{matrix} U = WI + α [\begin{matrix} U_{1} \\ U_{2} \\ U_{3} \\ ⋮ \\ U_{M} \end{matrix}] = [\begin{matrix} w_{11} & w_{12} & w_{13} \dots & w_{1 L} \\ w_{21} & w_{22} & w_{23} \dots & w_{2 L} \\ w_{31} & w_{32} & w_{33} \dots & w_{3 L} \\ ⋮ & ⋮ & ⋮ \dots & ⋮ \\ w_{M 1} & w_{M 2} & w_{M 3} \dots & w_{ML} \end{matrix}] [\begin{matrix} I_{1} \\ I_{2} \\ I_{3} \\ ⋮ \\ I_{L} \end{matrix}] + [\begin{matrix} α_{1} \\ α_{2} \\ α_{3} \\ ⋮ \\ α_{M} \end{matrix}] \end{matrix} .$ (6) where, elements of weight matrix W contain corresponding weights between input to hidden neurons and elements of bias matrix α contains biases of hidden neurons. Let f bean activation function and be its derivative. The output matrix ( O ) is obtained by split-type activation function over internal potential matrix ( U ) at hidden layer ( O = f ( U )): $[O_{1} O_{2} \dots O_{M}]^{T} = [f (U_{1}) f (U_{2}) \dots f (U_{M})]^{T} .$ (7) where, $f (U_{m}) = [\begin{matrix} f (U_{m}^{r}) f (U_{m}^{x}) f (U_{m}^{y}) f (U_{m}^{z}) \\ f (- U_{m}^{x}) f (U_{m}^{r}) f (- U_{m}^{z}) f (U_{m}^{y}) \\ f (- U_{m}^{y}) f (U_{m}^{z}) f (U_{m}^{r}) f (- U_{m}^{x}) \\ f (- U_{m}^{z}) f (- U_{m}^{y}) f (U_{m}^{x}) f (U_{m}^{r}) \end{matrix}] .$

The internal potential matrix ( V ) at output layer of the network is computed as similar to Equation (6), which is defined as: $V = SO + β .$ (8) where, elements of weight matrix S possess strength of synaptic connections between hidden and output neurons and column vector β possesses all quaternionic biases of respective output neurons. The output matrix ( Y ) is obtained as similar to Equation 7 at the output layer ( Y = f ( V )).

2.1.2 Backward pass

In order to develop a QDNN based learning machine, we present the derivation of the error-backpropagation learning algorithm in quaternion domain (QDBP) through minimization of average mean square error (E) of the network:

$E = \frac{1}{8 N} \sum_{n = 1}^{4 N} diag ([\begin{matrix} e_{1} & 0 & 0 & 0 & 0 \\ 0 & e_{2} & 0 & 0 & 0 \\ 0 & 0 & e_{3} & 0 & 0 \\ 0 & 0 & 0 & ⋱ & 0 \\ 0 & 0 & 0 & 0 & e_{N} \end{matrix}] [\begin{matrix} e_{1}^{*} 0 & 0 & 0 & 0 \\ 0 & e_{2}^{*} & 0 & 0 & 0 \\ 0 & 0 & e_{3}^{*} & 0 & 0 \\ 0 & 0 & 0 & ⋱ & 0 \\ 0 & 0 & 0 & 0 & e_{N}^{*} \end{matrix}]) .$ (9) where, the * symbol denotes quaternionic conjugate defined in Equation (2) and the output error matrix ( e ) presents the difference between actual ( Y ) and desired ( Y ^D) output at output layer, defined as: $e = Y - Y^{D} = [\begin{matrix} Y_{1} \\ Y_{2} \\ Y_{3} \\ ⋮ \\ Y_{N} \end{matrix}] - [\begin{matrix} Y_{1}^{D} \\ Y_{2}^{D} \\ Y_{3}^{D} \\ ⋮ \\ Y_{N}^{D} \end{matrix}] = [\begin{matrix} Y_{1} - Y_{1}^{D} \\ Y_{2} - Y_{2}^{D} \\ Y_{3} - Y_{3}^{D} \\ ⋮ \\ Y_{N} - Y_{N}^{D} \end{matrix}] .$ (10)

The update equations of weight and bias matrices are obtained by employing a gradient decent optimization approach on MSE e.g. mean square error (E). The weight update matrix (Δ S ) between hidden-output layers and bias update matrix (Δ β ) at the output layer of the network are presented as follows: $Δ β = [\begin{array}{l} Δ β_{1} \\ Δ β_{2} \\ Δ β_{3} \\ ⋮ \\ Δ β_{N} \end{array}] = \frac{η}{N} [\begin{array}{l} e_{1} ⊙ f^{'} (V_{1}) \\ e_{2} ⊙ f^{'} (V_{2}) \\ e_{3} ⊙ f^{'} (V_{3}) \\ ⋮ \\ e_{N} ⊙ f^{'} (V_{N}) \end{array}] .$ (11)

$Δ S = [\begin{matrix} Δ s_{11} & Δ s_{12} & Δ s_{13} & \dots & Δ s_{1 M} \\ Δ s_{21} & Δ s_{22} & Δ s_{23} & \dots & Δ s_{2 M} \\ Δ s_{31} & Δ s_{32} & Δ s_{33} & \dots & Δ s_{3 M} \\ ⋮ & ⋮ & ⋮ & \dots & ⋮ \\ Δ s_{N 1} & Δ s_{N 2} & Δ s_{N 3} & \dots & Δ s_{N M} \end{matrix}] = \frac{η}{N} [\begin{array}{l} e_{1} ⊙ f^{'} (V_{1}) \\ e_{2} ⊙ f^{'} (V_{2}) \\ e_{3} ⊙ f^{'} (V_{3}) \\ ⋮ \\ e_{N} ⊙ f^{'} (V_{N}) \end{array}] [\begin{array}{l} O_{1}^{*} \\ O_{2}^{*} \\ O_{3}^{*} \\ ⋮ \\ O_{N}^{*} \end{array}] .$ (12) where, $η \in ℝ^{+}$ denotes a learning rate and ⊙ denotes element-wise multiplication of two quaternionic matrices as defined in Equation (3). Similarly, weight update matrix (Δ W ) between input-hidden layers and bias update matrix (Δ α ) at hidden layer of the network are presented as follows:

$Δ α = \frac{η}{N} ({[\begin{matrix} Δ s_{11} & Δ s_{12} & Δ s_{13} & \dots & Δ s_{1 m} \\ Δ s_{21} & Δ s_{22} & Δ s_{23} & \dots & Δ s_{2 M} \\ Δ s_{31} & Δ s_{32} & Δ s_{33} & \dots & Δ s_{3 m} \\ ⋮ & ⋮ & ⋮ & \dots & ⋮ \\ Δ s_{N 1} & Δ s_{N 2} & Δ s_{N 3} & \dots & Δ s_{N M} \end{matrix}]}^{T} [\begin{array}{l} e_{1} ⊙ f^{'} (V_{1}) \\ e_{2} ⊙ f^{'} (V_{2}) \\ e_{3} ⊙ f^{'} (V_{3}) \\ ⋮ \\ e_{N} ⊙ f^{'} (V_{N}) \end{array}]) ⊙ [\begin{array}{l} f^{'} (U_{1}) \\ f^{'} (U_{2}) \\ f^{'} (U_{3}) \\ ⋮ \\ f^{'} (U_{M}) \end{array}] .$ (13)

$Δ w = \frac{η}{N} (({[\begin{matrix} Δ s_{11} & Δ s_{12} & Δ s_{13} & \dots & Δ s_{1 M} \\ Δ s_{21} & Δ s_{22} & Δ s_{23} & \dots & Δ s_{2 M} \\ Δ s_{31} & Δ s_{32} & Δ s_{33} & \dots & Δ s_{3 M} \\ ⋮ & ⋮ & ⋮ & \dots & ⋮ \\ Δ s_{N 1} & Δ s_{N 2} & Δ s_{N 3} & \dots & Δ s_{N M} \end{matrix}]}^{T} [\begin{matrix} e_{1} ⊙ f^{'} (V_{1}) \\ e_{2} ⊙ f^{'} (V_{2}) \\ e_{3} ⊙ f^{'} (V_{3}) \\ ⋮ \\ e_{N} ⊙ f^{'} (V_{N}) \end{matrix}]) ⊙ [\begin{matrix} f^{'} (U_{1}) \\ f^{'} (U_{2}) \\ f^{'} (U_{3}) \\ ⋮ \\ f^{'} (U_{M}) \end{matrix}]) [\begin{matrix} I_{1}^{*} \\ I_{22}^{*} \\ I_{3}^{*} \\ ⋮ \\ I_{L}^{*} \end{matrix}] .$ (14)

2.1.3 Pseudo code of proposed learning algorithm

For the sake of simplicity and better understanding, we further present an algorithm QDNN_TRAIN(.) for training of quaternionic domain neural network (QDNN), which is elaborated by procedures QDNN_INIT(.), QDNN_FORWARD(.) and QDNN_BACKWARD(.). The learning and generalization ability of a three-layered neural structure is obtained through optimization of mean square error. The procedure QDNN_INIT(.) randomly initializes the weight and bias matrices in considered network. It calls the RANDOM_QM(a, b) procedure which randomly generates the quaternionic matrix of each interconnection weight and bias of neuron in the range from a to b. The QDNN_FORWARD(.) procedure is intended to implement forward pass of QDNN, hence generate internal potentials ( U , V ) and hence outputs ( O , Y ) matrices at respective layers.

The ACTIVATION_FUNCTION(.) limits the output of corresponding neuron of the network. For updates weight and bias matrices, QDNN_BACKWARD(.) is developed for the backward pass of QDNN. All required procedures are presented in pseudo code are as follows: $\begin{matrix} procedure QDNN_TRAIN (I, Y^{D}, η, \in) \\ begin \\ QDNN_INIT (L, M, N); \\ while E_{T} > \in do \\ for i \leftarrow 1 until S = length (I) do \\ U, O, V, Y \leftarrow QDNN_FORWARD (W, α, S, β, I); \\ e \leftarrow Y - Y^{D}; \\ E_{i} \leftarrow \frac{1}{8 N} \sum_{n = 1}^{4 N} diag ({ee}^{*}); \\ QDNN_BACKWARD (U, O, S, V, η, e) \\ E_{T} \leftarrow \frac{1}{s} \sum_{i = 1}^{s} E_{i}; \\ end \end{matrix}$ $\begin{matrix} procedure QDNN_INIT (L, M, N) \\ begin \\ for m \leftarrow 1 until M do \\ for l \leftarrow 1 until L do \\ w_{ml} \leftarrow RANDOM_QM (a, b); \\ α_{m} \leftarrow RANDOM_QM (a, b); \\ for n \leftarrow 1 until N do \\ for m \leftarrow 1 until M do \\ s_{nm} \leftarrow RANDOM_QM (a, b); \\ β_{n} \leftarrow RANDOM_QM (a, b); \\ end \end{matrix}$ $\begin{matrix} procedure QDNN_FORWARD (W, α, S, β, I) \\ begin \\ U \leftarrow WI + α; \\ O \leftarrow ACTIVATION_FUNCTION (U); \\ V \leftarrow SO + β; \\ Y \leftarrow ACTIVATION_FUNCTION (V); \\ end \end{matrix}$

$\begin{matrix} procedure QDNN_BACKWARD (U, O, S, V, η, e) \\ begin \\ Δ β \leftarrow (η / N) e ⊙ DER_ACTIVATION (V); \\ Δ S \leftarrow (η / N) (e ⊙ DER_ACTIVATION (V)) O^{*^{T}}; \\ Δ α \leftarrow (η / N) (S^{T} (e ⊙ DER_ACTIVATION (V))) ⊙ \\ DER_ACTIVATION (U); \\ Δ W \leftarrow (η / N) ((S^{T} (e ⊙ DER_ACTIVATION (V))) ⊙ \\ DER_ACTIVATION (U)) I^{*^{T}}; \\ β \leftarrow β + Δ β; \\ S \leftarrow S + Δ S; \\ α \leftarrow α + Δ α; \\ W \leftarrow W + Δ W; \\ end \end{matrix}$

$\begin{matrix} procedure RANDOM_QM (a, b) \\ begin \\ q_{0} \leftarrow [a + (b - a)] RAND (1); \\ q_{1} \leftarrow [a + (b - a)] RAND (1); \\ q_{2} \leftarrow [a + (b - a)] RAND (1); \\ q_{3} \leftarrow [a + (b - a)] RAND (1); \\ q \leftarrow [\begin{matrix} q_{0} & q_{1} & q_{2} & q_{3} \\ - q_{1} & q_{0} & - q_{3} & q_{2} \\ - q_{2} & q_{3} & q_{0} & - q_{1} \\ - q_{3} & - q_{2} & q_{1} & q_{0} \end{matrix}]; \\ end \end{matrix}$ $\begin{matrix} procedure ACTIVATION_FUNCTION (q) \\ begin \\ Q = f (q); \\ end \end{matrix}$

3. Performance evaluation of learning machine through benchmark problems

In this section, we evaluate the effectiveness of learning machine through a wide spectrum of benchmark problems: function approximations, linear transformations, and 3D face recognition. The components of all quaternionic weights and biases are randomly initialized in the range –1 to 1. The quaternionic variable q ₀ = 1 + i + j + k is assumed as bias input and the hyper-tangent function is used as activation function. For each benchmark problem, a better three-layered network has considered, which is observed during the simulation of various network topologies. A comparative performance between real-valued neural network, complex-valued neural network and quaternionic-valued neural network with respective algorithms real-valued backpropagation (RVBP), complex-valued backpropagation (CVBP) and quaternionic-domain backpropagation (QDBP) is thoroughly evaluated for function approximations by statistical parameters like error variance, correlation, and Akaike information criterion (AIC) [20]. Another class of benchmark problems, the learning of linear transformations (rotation, scaling, and translation and their combinations), is promising one as training is performed through a few sets of point lying on the line and trained network is able to generalize over complicated 3D geometric structures. In last subsection, two primary experiments are presented for 3D face recognition; surely it will be stepping stone for prospective researchers to extend this novel technique over a large data set. In last two experiments, each point is represented by a quaternion which contains intended components along with phase information embedded within a number; therefore RVNN and CVNN are not able to perform such experiments.

3.1. Function approximations

3.1.1 The Lorenz system

The dynamics of the Lorenz system [21] is governed by the system of three differential equations which shows the chaotic behaviour depending on its parameter values.

\begin{matrix} \frac{dx}{dt} = σ (y - x) \\ \frac{dy}{dt} = x (ρ - z) - y \\ \frac{dz}{dt} = xy - β z \end{matrix}

(15) where, the symbols σ, ρ and β are parameters of the Lorenz’s system. On the basis of its parameters (σ = 15, ρ = 28 and β = 8/3), this system (Equation 15) generates 6537 terms of the time series with initial condition (x = 0.7, y = 0.1, z = 0.1) using fourth order Runge-Kutta method. Each term can be considered in the form of quaternionic input as 0 + x i + y j + z k . Further, the normalization is performed in the range from –0.8 to 0.8. The first 500 terms of the time series have been used for training and rest for testing of three-layered RVNN (3-11-3), CVNN (3-8-3) and QDNN (1-3-1) structure separately. Experiments demonstrate that the QDNN network requires lesser number of average training cycles to achieve the desired MSE as compared to RVNN and CVNN as reported in Table 1. Figure 1 shows the testing results of the networks trained by QDBP for prediction of 3D time series of Lorenz system. Table 1 also reports the significant outperformance of QDNN over RVNN and CVNN in terms of network topology, training cycles, testing MSE, error variance, correlation and AIC.

Table 1

Comparison of training and testing performance for Lorenz system

Neuron Type	Real-valued	Complex-valued	Quaternionic-valued
Algorithm	RVBP	CVBP	QDBP
Network Topology	3-11-3	3-8-3	1-3-1
MSE Training	0.0015	0.0010	0.0006
Average Epoch	15000	11000	9000
MSE Testing	0.0042	0.0029	0.0012
Error Variance	0.0026	0.0019	0.0009
Correlation	0.87327	0.8921	0.9323
AIC	– 6.3329	– 6.6160	– 7.4503

Fig.1

3D plot of the Lorenz system tested by the QDNN network trained through QDBP.

3.1.2 The Chua’s circuit

Chua’s circuit is the simplest autonomous electronic circuit containing registers, capacitors and inductors that exhibit the chaotic behavior under specific parametric conditions [22]. This circuit satisfies the chaotic criterion which contains one or more non-linear elements, one or more active registers and three or more energy storage devices. It uses the one Chua’s diode as non-linear element, one locally active register and two capacitors and one inductor as energy storage devices. The dynamics of Chua’s circuit are governed by three state equations as $\begin{matrix} \frac{dx}{dt} = α [y - x - h (x)] \\ \frac{dy}{dt} = x - y + z \\ \frac{dz}{dt} = - β y - γ z \end{matrix}$ (16) where, h (x) presents the electrical response of non-linear register defined as $h (x) = m_{1} x + \frac{1}{2} (m_{0} - m_{1}) (| x + 1 | - | x - 1 |)$ and α, β, γ, m₀ and m₁ are the constant parameters. The symbols x, y and z are voltages across two capacitors and an inductor respectively, and their combinations show the chaotic attractor in three dimensions. The double scrolled chaotic attractor [22] is obtained with the parameters α = 15.6, β = 28, γ = 0, m₀ = -1.143 and m₁ = -0.714. The chaotic time series has been obtained from the simulation of the system (Equation 16) with time step 0.1 Sec and initial voltages x = 0.1, y = 0.1 and z = 0.1 by using fourth order Runge-Kutta method. The normalization of input-output imaginary quaternions is done in -0.8 to 0.8 (real part is zero and imaginary parts (x, y, z) present corresponding voltages). A time series containing 500 terms obtained from simulated system has been used to train RVNN, CVNN and QDNN. The training results of all these networks reported in Table 2 which demonstrates that QDNN trained by the QDBP algorithm requires a significantly smaller number of average epochs to achieve the threshold training error than RVBP and CVBP. The next 500 terms of that time series have been tested through networks trained by these algorithms. Figure 2 shows the 3D patterns of desired and actual data for chaotic behavior of Chua’s circuit. The testing results in terms of error, variance, correlation, and AIC as reported in Table 2 again infer the superiority of QDNN over real-valued and complex-valued neural network.

Table 2

Comparison of training and testing performance for Chua’s circuit

Neuron Type	Real-valued	Complex-valued	Quaternionic-valued
Algorithm	RVBP	CVBP	QDBP
Network Topology	3-12-3	3-8-3	1-3-1
MSE Training	0.0012	0.0009	0.0008
Average Epoch	10000	8500	7000
MSE Testing	0.0025	0.0021	0.0017
Error Variance	0.0020	0.0011	0.0008
Correlation	0.9734	0.9801	0.9874
AIC	– 6.5332	– 6.8222	– 7.0101

Fig.2

Testing through QVNN network trained by QDBP for Chua’s circuit.

3.2. Linear transformations

In order to evaluate the performance of QDNN, we have considered a three-layered neural structure (2-M-2). This section presents the learning of linear transformations (rotation, scaling, and translation and their combinations) by QDNN through a few sets of points on the line and generalization over complicated 3D objects. Each quaternionic variable q _i = 0 + x_i i + y_i j + z_i k undergoes a transformation function (T) and correspondingly yields a transformed quaternionic variable $q_{i}^{'} = 0 + x_{i}^{'} i + y_{i}^{'} j + z_{i}^{'} k$ represented in the quaternionic matrix as follows: $q_{i}^{'} = T (q_{i}) = {aq}_{i} + b; (i = 1, 2, 3, \dots n_{p})$ (17) where, n_p denotes the number of points that lies on the surface of 3D objects and a and b are quaternions such that norm of a $| | a | | = \sqrt{0^{2} + a_{1}^{2} + a_{2}^{2} + a_{3}^{2}}$ which denotes the scaling factor. Argument of a yields rotation in q , while b performs translation of 3D object in the distance (|| b ||). The combinations of transformations facilitate the viewing of 3D objects from different orientations, interpretation of their motion, etc.

For training on a three-layered 2-6-2 QDNN, all experiments consider a straight line in space containing few input data points (21 points) on line and a reference point (mid point). The set of point (x, y, z) lying on line goes to the first input and a second input passes the reference point (x_r, y_r, z_r). The incorporation of the reference point provides more information to learning a system which yields better accuracy. Similarly, the first and second output neurons of output layer result the transformed point (x′, y′, z′) on line and transformed reference point $(x_{r}^{'}, y_{r}^{'}, z_{r}^{'})$ respectively. The learning of the transformation is achieved by learning the algorithm presented in section 2.2 with a suitable learning rate. The trained QDNN is able to generalize over huge number of points cloud data of complicated geometrical structure like sphere, cylinder, torus and this ability of the network presents the 3D motion interpretation of objects. It is worthwhile to mention here that learning of phase information is not possible by RVNN as well as CVNN hence such transformation is not possible through RVNN and CVNN; therefore this section only presents the result obtained by QDNN.

3.2.1 Scaling transformation

The learning of 2-6-2 QDNN structure is performed for scaling transformation through input-output mapping for scaling factor 1/2 over 3D line containing 21 points where the point (0, 0, 0) is the reference point as shown in Fig. 3(a). Convergence of mean square error (Fig. 3(b)) shows the smart learning capability of the proposed network. The training of QDNN with 0.00005 learning rate converges to MSE = 1.005567e-05 after 20000 iterations. The trained network is able to generalize over many complicated standard geometric structures like sphere (4141 data points), cylinder (2929 data points), and torus (10201 data points) which is presented in Fig. 4(a-c) respectively.

Fig.3

(a) Training input-output mapping for scaling with scaling factor 1/2; (b) Convergence of mean square error.

Fig.4

Testing results from similarity transformation over (a) sphere, (b) cylinder, and (c) torus.

3.2.2 Scaling and translation transformation

The learning of 2-6-2 QDNN is performed in combination of scaling (scaling factor 1/2) and translation (0.3 unit in positive Y-direction) through input-output mapping over 3D line containing 21 data points referenced at (0, 0, 0), as shown in Fig. 5(a). Convergence curve of QDNN shown in Fig. 5(b), with learning rate 0.00005, up to 2.58514e-05 mean square error shows the smart learning capability of the proposed learning machine after 20000 iterations. The trained network is able to generalize well over many complicated standard geometric structures like sphere (4141 data points), cylinder (2929 data points), and torus (10201 data points) as shown in Fig. 6(a-c)respectively.

Fig.5

(a) Training patterns: input-output mapping shows transformation with scaling factor 1/2, followed by translation with 0.3 units in positive Y-direction (b) Convergence of mean square error.

Fig.6

Testing results from similarity transformation through (a) sphere, (b) cylinder, and (c) torus.

3.2.3 Scaling, translation and rotation transformation

The learning of QDNN for general linear transformation (scaling factor 1/2, counterclockwise rotation about the X-axis by π/2 radian, and translation by (0, 0, 0.3)) is performed for, through input-output mapping over straight line and reference (0, 0, 0), as shown in Fig. 7(a). The 2-6-2 QDNN model is used for training of these transformations through 21 data points in a straight line. Convergence of mean square error 1.0e-04 after 20000 iterations is achieved with the 0.00005 learning rate, as shown in Fig. 7(b). The trained network is also able to generalize over many complicated standard geometric structures like sphere (4141 data points), cylinder (4141 data points), and torus (10201 data points) as shown in Fig. 8(a-c) respectively.

Fig.7

(a) Training mapping patterns through straight line (scaling factor 1/2, counterclockwise rotated about the X-axis by π/2 radian, and translated by (0, 0, 0, 3)); (b) Square error during training of straight line pattern.

Fig.8

Generalization of a linear transformation (scaling factor 1/2, counterclockwise rotated about the x-axis by π/2 radian, and translated by (0, 0, 0.3) over (a) sphere, (b) cylinder, and (c) torus.

All transformation experiments promise the intelligent behavior of QDNN for motion interpretation of 3D objects. Further, this novel experiment provides a direction to generalize the motion for intelligent system design for a variety of operations.

3.3. 3D face recognition

This section presents a basic experiment, though with a small data set but its implication is wide for the applicability of proposed learning machine for 3D recognition. The proposed method has a great deal to perform successful recognition in variable head position, orientation, and facial expressions. Two experiments are conducted here to learn and classify point cloud data of 3D faces using proposed quaternionic domain backpropagation algorithm. A simple structure of (1-2-1) QDNN with single input-output performs experiments using only two quaternionic neurons at hidden layer.

The first experiment is performed on a dataset containing 05 faces of the same person (4654 points cloud data) with different orientation and poses; the learning of QDNN is made with one face (Fig. 9a) and testing over all faces. Table 3 presents the testing MSE (mean square error) of all five faces which are comparable; hence demonstrate that they are faces of same person irrespective of variations in face orientation and poses. It infers straightforward learning and generalization ability of a simple QDNN which is not possible by RVNN.

Fig.9

Five 3D faces of same person with different orientation and poses.

Table 3

Comparison of testing MSE of faces of the same person with different orientation (MSE Training =0.0001)

S. No.	Face (Figure)	Test error
1.	9(a)	2.4842e-04
2.	9(b)	3.5431e-03
3.	9(c)	5.1153e-03
4.	9(d)	4.5212e-04
5.	9(e)	3.9148e-04

Similarly, the second experiment is performed on a dataset containing 05 faces of different people (6397 points cloud data); the learning of QDNN is made with one face (Fig. 10a) and testing over all faces. Table 4 presents the testing MSE of each face obtained from trained network, which shows that the MSE of other four faces are much higher in comparison to the face (Fig. 10a) used in training. This demonstrates that the simple QDNN correctly classifies the faces of same or different person. It again reveals the learning and generalization capability of a proposed learning machine where real-valued neural network lacks.

Fig.10

Five 3D faces of different persons.

Table 4

Comparison of testing MSE of faces of different person (MSE Training =0.0001)

S. No.	Face (Figure)	Test error
1.	10(a)	1.8214e-04
2.	10(b)	8.1344e-01
3.	10(c)	3.5709e-00
4.	10(d)	6.2814e-02
5.	10(e)	3.1738e-01

4. Conclusion

In this paper, we present an efficient and generalized learning machine for high-dimensional problems and evaluate it with a variety of problems of different areas. The proposed neural network with learning algorithm in quaternionic domain directly processes three or four dimension data without the hassle of its different components and phase information among them. The quaternion is the number which possesses the magnitude of intended components and phase information of each component is embedded in it. Thus, quaternionic domain neural network (QDNN) leads to simple network structure, efficient learning and better performance; whereas conventional real-valued neural network (RVNN) deals with individual components hence need huge topology, slow learning and poor performance. Apart from that RVNN and complex-valued neural network (CVNN) do not work for problems where it is required to learn and generalize phase information like 3D object recognition and motion or transformation of objects in 3D space. It is worth to mention here again that the proposed machine learns the composition of transformations through input-output mapping over a line containing a small set of points and generalize this motion over complex geometrical structure such as sphere, cylinder, and torus. Although, the problem presented for recognition in 3D imaging is small and basic but it is very encouraging for prospective researcher due to network simplicity, faster convergence, and the result.

References

B.K.

Tripathi , On the complex domain deep machine learning for face recognition, Applied Intelligence, Springer, 2016, ISSN: 0924-669X, DOI 10.1007/s10489-017-0902-7

B.K.

Tripathi and

P.K.

Kalra , On efficient learning machine with root power mean neuron in complex domain, IEEE Transaction on Neural Networks 22(05) (2011), 727–738.

Nitta , An extension of the back-propagation algorithm to complex numbers, Neural Networks 10(8) (1997), 1391–1415.

Nitta , An analysis of the fundamental structure of complex-valued neurons, Neural Process 12 (2000), 239–246.

Hirose , Complex-valued neural networks. Springer-Verlag, New Yark, 2006.

M.K.

Muezzinoglu ,

Guzelis and

J.M.

Zurada , A new design method for complex-valued multistate Hopfield associative memory, IEEE Transaction Neural Networks 14(4) (2003), 891–899.

B.K.

Tripathi and

P.K.

Kalra , Complex generalized-mean neuron model and its applications, Applied Soft Computing, Elsevier Science 11(01) (2011), 768–777.

Nitta , 3D vector version of the back-propagation algorithm, Int Joint Conf on Neural Networks 2 (1992), 511–516.

B.K.

Tripathi and

P.K.

Kalra , On the learning machine in three dimensional mapping, Neural Computing and Applications 20 (2011), 105–111.

10.

B.K.

Tripathi , High dimensional neurocomputing: Growth, appraisal and applications, Springer, London, 2014.

11.

W.R.

Hamilton , Lectures on quaternions. Hodges and Smith: Dublin, Ireland, 1853.

12.

J.B.

Kuipers , Quaternions and rotation sequences: A primer with applications to orbits, aerospace and virtual Reality. Princeton University Press: Princeton, NJ, USA, 1998.

13.

S.G.

Hoggar , Mathematics for computer graphics, Cambridge University Press: Cambridge, MA, USA, 1992.

14.

Nitta , A quaternary version of the back-propagation algorithm, ICNN 5(1) (1995), 2753–2756.

15.

Isokawa ,

Nishimura and

Matsui , Quaternionic multilayer perceptron with local analyticity, Information 3 (2012), 756–770. doi: 10.3390/info3040756

16.

B.C.

Ujang ,

C.C.

Took and

D.P.

Mandic , Quaternion-valued nonlinear adaptive filtering, IEEE Transaction on Neural Networks 22(8) (2011), 1193–1206.

17.

Wang ,

C.C.

Took and

D.P.

Mandic , A class of fast quaternion valued variable stepsize stochastic gradient learning algorithms for vector sensor processes, IJCNN, 2011, pp. 2783–2786.

18.

Nitta , A solution to the 4-bit parity problem with a single quaternary neuron, Neural Inf Process Lett Rev 5(2) (2004), 33–39.

19.

Isokawa ,

Kusakabe ,

Matsui and

Peper , Quaternion neural network and its application, LNAI 2774 (2003), 318–324.

20.

D.B.

Foggel , An information criterion for optimal neural network selection, IEEE Trans Neural Netw 2(5) (1991), 490–497.

21.

E.N.

Lorenz , Deterministic nonperiodic flow, Journal of the Atmospheric Sciences 20(2) (1963), 130–141.

22.

L.O.

Chua ,

Matsumoto and

Komuro , The double scroll, IEEE Transactions on Circuits and Systems CAS 32(8) (1985), 798–818.

23.

Xu and

D.E.

Root , Advances in artificial neural network models of active devices, 2015 IEEE MTT-S International Conference on Numerical Electromagnetic and Multiphysics Modeling and Optimization (NEMO), Ottawa, ON, Canada, 2015. doi: 10.1109/NEMO.2015.7415102

24.

X.G.

Chen , Research on reliability of complex network for estimating network reliability, Journal of Intelligent & Fuzzy Systems 32(5) (2017), 3551–3560.

25.

Zhai ,

Li and

Wang , A cross-selection instance algorithm, Journal of Intelligent & Fuzzy Systems 30(2) (2016), 717–728.

26.

M.I.

Jordan and

T.M.

Mitchell , Machine learning: Trends, perspectives, and prospects, Science 349(6245) (2015), 255–260.

27.

Zhai ,

Wang and

Pang , Voting-based instance selection from large data sets with mapreduce and random weight networks, Information Sciences 367-368 (2016), 1066–1077.

28.

Scardapane ,

Van Vaerenbergh ,

Hussain and

Uncini , Complex-valued neural networks with nonparametric activation functions, IEEE Transactions on Emerging Topics in Computational Intelligence (2018), 1–11. doi: 10.1109/tetci.2018.2872600