Abstract
Vibration signals generated from gears often exhibit nonlinearity. Characterization of such signals using nonlinear time series analysis can be a good alternative for identifying gear faults. This paper presets a recurrence network based approach to extract features from vibration signals for gear fault diagnosis. Quantitative parameters (such as mean degree centrality, global clustering coefficient, assortativity of the recurrence network, or network entropy) related to the dynamical complexity of the vibration signals are calculated from the generated recurrence network to help classify different gear faults with two kinds of classifiers, i.e., support vector machine and extreme learning machine. Experimental studies performed on two different gear test systems have verified the effectiveness of the presented recurrence network approach for gear fault severity evaluation, as well as gear fault classification.
Introduction
Gear is one of the common and critical components in rotary machines for power transmission. Once a gear fault occurs, it will cause degradation of the whole machine performance. In some severe circumstances, the gear fault may lead to machine shutdown and cause economic losses. Therefore, detecting the gear fault at its early stage while the machine is still in operation is necessary to avoid abnormal event, and reduce productivity loss.
Various methods for analyzing vibration signals in time, frequency, and/or time– frequency domain have been proposed to diagnose gear faults [1, 25]. However, gear fault is often accompanied with changes from linear or weak nonlinear to strong nonlinear dynamics of the rotary machine, and its corresponding vibration signals exhibit nonlinearity. Therefore, nonlinear time series analysis presents a good alternative to characterizing vibration signals for gear fault diagnosis.
Among various nonlinear time series analysis techniques, the recently developed recurrence network provides a new way of studying a system’s dynamical complexity from the measured time series data. Local and global measures of recurrence network have been applied to various disciplines. For examples, the electric power grid was represented as a recurrence network, which can diagnose the vulnerability of the system under cascading failures [6]. Recurrence network was also used to analyze cardiovascular time series with the aim of performing early prediction of preeclampsia [23]. In another study, it was applied to studying the structural properties of electroencephalographic signals using the global clustering coefficient and the average path length of the associated ɛ-networks [28]. Besides, recurrence network provided insights into the relationship between network topology and functional organization of complex brain networks [33]. In the area of fault diagnosis, recurrence network has been used for characterizing both the rolling bearing and the rotor faults [29, 30].
Inspired by the prior research, this paper presents recurrence network as an alternative tool to extract representative features for characterizing gear working status. With the help of two classifiers, i.e., support vector machine (SVM) and extreme learning machine (ELM), gear faults as well as their severity can be evaluated. The paper is organized as follows. The theoretical background of the recurrence network is introduced in Section 2, in which threshold determination of the network is discussed. Then the classifiers used in this paper are briefly introduced in Section 3. After that, experimental studies are carried out to verify the effectiveness of the recurrence network for gear fault diagnosis in Section 4. Finally, some conclusions are drawn in Section 5.
Theoretical background
Transforming time series into phase space
According to dynamic system theory, phase space can help to reveal hidden characteristics inherently existed in a nonlinear system. Given a time series from a single observable, embedding is a common way to describe properties of the phase space with unknown dimension. With a suitable embedding dimension m and time delay τ [31], a scalar time series x (t) (t = 1, …, N) can be transformed into phase space as x(m) (t) = (x (t) , x (t + τ) , …, x (t + (m - 1) τ)). Then the binary recurrence matrix
Vertices of the recurrence network are represented by the phase space vectors [7, 27] and recurrences are represented by links between vertices. The binary adjacency matrix
Recurrence network analysis provides important complementary features that can characterize the dynamical system. These features are based on spatial dependences in phase space between individual time series [10].
Relationship between recurrence network and phase space is described in Table 1.
Relationship between recurrence network and phase space
Relationship between recurrence network and phase space
Because of the natural definitions of vertices, edges and paths, the topological characteristics in the recurrence network can reveal intrinsic properties of the dynamical system in phase space. Quantitative characteristics of the topological features in the recurrence network are considered as complementary measures of the dynamical system. To evaluate the importance of a vertex in the recurrence network, the degree centrality (local recurrence rate) k v of a vertex v is defined as the number of neighbors directly connected with v:
In some cases, it is more meaningful to characterize the mean degree of all vertices, and the mean degree centrality is used as a characteristic quantity for all vertices
From Equation (5), it can be seen that the mean degree centrality is directly proportional to the network’s global edge density ρ. Furthermore, the maximum number of possible links, N-1, is used to normalize degree centrality to obtain the local edge density as
The clustering coefficient, C v , of a vertex v can quantify the average interconnectivity of the direct neighbors of the certain vertex. The clustering coefficient is defined as [17]
The average value of the clustering coefficients of all vertices is a global characteristic parameter of the topology in the recurrence network, which is named as global clustering coefficient and defined as
The global clustering coefficient quantifies the mean ratio of triangles which contain different vertices in the recurrence network. The global clustering coefficient in a recurrence network, C, also represents the average local dimensionality of the dynamical system in the phase space.
If vertices incline to linking to other vertices with a similar degree k, the recurrence network is assortative. Otherwise it is disassortative if vertices with high degree of similarity prefer to connecting to vertices with low degree of similarity, and vice versa. Hence, the Pearson correlation coefficient of the vertex degrees on both ends of all edges can be used to quantify assortativity as [17, 21]
If the density of states in the phase space keeps unchanged within an ɛ-ball, the vertices tend to link to other vertices with a similar degrees, and the assortativity A s will be positive. Therefore, it can be used as a parameter to evaluate the continuity of the state density.
In a scale-free network, vertices with small degree may work primarily, which can lead to undervaluation of the real ratio of triangles in the recurrence network. In order to eliminate such effects, network transitivity is proposed and defined as [2, 4]
The difference between C and T (ɛ) is that C and T (ɛ) measure the system’s characteristics from different point of view. C represents the average local dimensionality of the system, while T (ɛ) represents the global dimensionality of the system.
In addition, Shannon entropy is introduced as a measure to characterize heterogeneity of the recurrence network. In a recurrence network with N vertices, Shannon entropy is defined as [32]:
Obviously the network’s Shannon entropy doesn’t consider the isolated vertices’ influence on the network structure. However, there are isolated vertices in the connected recurrence network. Isolated vertices mean the process is nonstationary in which some states are rare or far from the normal or transitions may have occurred. In order to measure the effect of the isolated vertices, isolation rate of the recurrence network is defined as
The threshold in Equation (1) is a very important parameter to decide the characteristics of the recurrence network. In order to select an appropriate threshold, relationship between the threshold and quantitative characteristics of the recurrence network is studied.
The range of threshold is set to [0, 2] and the increment is 0.1. The effect of the threshold on the assortativity is first studied. As shown in Fig. 1, the assortativity increases and then decreases with the increase of the threshold. The assortativity arrives at the maximum while the threshold is 0.5.

Relationship between threshold and assortativity.
Then, the relationship between the threshold and network entropy is studied. As shown in Fig. 2, the network entropy increases rapidly first and then increases slowly with the increase of the threshold. The network entropy rises slowly after the threshold increases to 0.5.

Relationship between threshold and network entropy.
However, the relationship between isolation rate and the threshold is different. The isolation decreases rapidly and goes down to 0. It indicates that the threshold can’t be too large or the recurrence network can’t reveal the nonlinear nature of the time series. This has been shown in Fig. 3.

Relationship between threshold and isolation rate.
On the other hand, the global recurrence rate monotonically increases when the threshold increases, as shown in Fig. 4. It increases faster when the threshold values is greater than 0.5. Other characteristics (mean degree centrality, global clustering coefficient and transitivity) of the recurrence network have similar relationships.

Relation between threshold and global recurrence rate.
From the above study, 0.5 is chosen as the threshold value, which guarantees the continuity of the density of states.
Besides, the false nearest neighbors (FNN) algorithm is used to determine the embedding dimension for the vibrational time series [16], and mutual information is used to determine the time delay [11].
Support vector machines
The basic idea of a binary SVM is to project data from the training data set to a high dimensional space and find an optimal hyper-plane, which separates the data with the maximal margin [5].
Given a data set
Given a test vector x
t
∈ R
n
, the classification output can be obtained as
SVM is a non-linear classifier based on kernel function [8]. Typically, the radial basis function is used as the kernel function, which is expressed as
In order to transform the binary SVM into multi-class SVM, the one-against-one (OAO) strategy is adopted [13].
ELM is a new learning algorithm for a single-hidden-layer feedforward network (SLFN) [18]. Parameters of the hidden-layer node are obtained by mathematical calculation rather than iterative adjustment in ELM, which shows good generalization performance with higher speed than traditional learning algorithms for feedforward neural networks [15]. In a SLFN with M hidden nodes, the output f
M
(x
j
) can be represented by
If a SLFN with M hidden nodes can realize zero error to approximate N samples (x
j
, t
j
) ∈ R
n
× R
m
, j = 1, 2, … N, where x
j
is an n-dimensional input vector and t
j
is an m-dimensional target vector, it means Equation (24) can be rewritten as:
Equation (25) can be rewritten as
H is the hidden layer’s output matrix of the network [15]; H’s ith column is the ith hidden node’s output vector corresponding to inputs x1, x2, …, x N and H’s jth row is the output vector of the hidden layer with respect to input x j .
The parameters of the hidden node, w
i
and b
i
in the SLFNs don’t need to be adjusted during training. The parameters may simply be allocated with random values based on any continuous sampling distribution. This makes Equation (26) a linear system, and the output weights are estimated as
Huang et al. [14] have proved the universal approximation ability of ELM via an incremental method.
To utilize the quantitative measures obtained from the recurrence network for characterizing the gear states, Fig. 5 shows the flowchart of recurrence network-based fault diagnosis approach. The time series of vibrational signals measured by sensors are transformed into phase space using embedding theory. Then the vectors in phase space are transformed into the recurrence network by maximum norm associated with a suitable threshold. After that, quantitative measures are calculated from the recurrence network, and used as the input to the classifiers for gear fault diagnosis.

The flowchart of the proposed method.
The first experimental study was conducted on a four-speed motorcycle gearbox test system [22]. As shown in Fig. 6, the electrical motor was run at a constant speed at 1420 RPM. In order to eliminate the vibration, four shock absorbers were installed under the base of the test system. Four different fault conditions were tested in this study, which include slight-worn gear, medium-worn gear, broken teeth of gear and one normal condition, respectively.

Experimental setup of motorcycle gearbox test system.
Signals were sampled at 16384 Hz, and the raw vibration signals of four conditions are shown in Fig. 7. The embedding dimension and time delay is set as 6 and 2, respectively.

Vibration signals measured on motorcycle gearbox.
The conditions of the gear can’t be recognized directly from the waveforms of the vibration signals. Through recurrence network theory, different features of the vibration signals are obtained. In this paper, the mean degree of all vertices <
Features extracted from different motorcycle gearbox conditions
To classify the conditions of the gear, ELM and SVM are used as the classifiers. 80 data samples with each condition containing 20 samples are used to train the classifiers, and another 80 data samples are used for testing. < k>, ρ, C, A s , T, H SE and IR are used as the features to the input of the classifiers. From Tables 3 and 4, 13 data samples are misclassified using SVM and the classification accuracy is 83.75%, while only four data samples are wrongly classified using ELM, and the classification accuracy is 95%. The comparison shows that ELM has better performance than SVM. The result also indicates quantitative measures extracted from recurrence network can characterize the gear severity effectively.
Results for four different gear conditions using SVM
Results for four different gear conditions using ELM
To further illustrate the effectiveness of the presented approach, some statistical parameters, including root mean square, peak, kurtosis, deviation coefficient, pulse and margin, were also extracted as input to ELM classifier for gear fault diagnoisis. The mathmatical expressions of these parameters are listed in Table 5, where x(t) represents the original gear vibration signal, μ is the average value of the signal x(t) and x p is the peak value of the signal. The diagnostic results are shown in Table 6, where 7 data samples are misclassified, leading to 91.25% classification accuracy, which is lower than that of reccurrence network-based approach.
Time-domain statistical features used in this study
Features from time domain using ELM
The second experimental study was conducted on a Drivetrain Dynamics Simulator (DDS) platform for characterizing different types of gear faults, as shown in Fig. 8. Table 7 lists different gear faults tested in this study. Vibration signals are acquired with 1024 Hz sampling rate and 512 s sampling window when the simulator is running at 30 Hz rotating speed.

Experimental setup of a DDS system.
Different types gear faults in DDS system
Figure 9 illustrates the waveforms of the gearbox vibration signals under four different working conditions. Using the FNN algorithm and mutual information, the embedding dimension and time delay are selected as 6 and 2, respectively. Threshold for constructing recurrence matrix is set as 0.4. Through recurrence network theory, the features < k>, ρ, C, A s , T, H SE and IR are also extracted from the vibration signals, and Table 8 lists sample features from each condition.

Raw vibration signals of four gearbox conditions.
Features extracted for different types of gear faults
To classify different gearbox conditions, 160 data samples are used to train the classifiers with each condition including 40 data samples. The same numbers of data samples in each condition are used for testing. From Tables 9 and 10, fifteen data are misclassified when SVM is used as the classifier, leading to 87.5% classification accuracy, while nine data are wrongly classified when ELM is used as the classifier, which corresponds to 92.5% classification accuracy. Again comparison study shows that ELM has better performance than SVM. Features extracted based on recurrence network can effectively characterize the gear working conditions.
Results for different types of gear faults using SVM
Results for different types of gear faults using ELM
The statistical paramters presented in the first experimental study are also calculated here and used as input to the ELM classifier. As listed in Table 11, 15 data samples are misclassified, leading to 87. 5%, classification accucry, which is 5 percent less than that of recurrence network features. This again proves the effectiveness of the presented approach for gear fault diagnosis.
Features from time domain using ELM
A recurrence network-based approach is introduced in this paper for gear fault diagnosis, in which various quantitative measures, such as the mean value of all vertices, global recurrence rate, the global clustering coefficient, assortativity, network transitivity, and isolation rate, are extracted as features to characterize the vibration signals. Two experimental case studies are investigated to verify the effectiveness of recurrence network as a means for characterizing gear working conditions. The results show that recurrence network can not only evaluate gear fault severity, but also classify different gear faults. In summary, recurrence network provides a good and powerful mathematical tool for nonlinear time series analysis with great potential to machine fault diagnosis.
Footnotes
Acknowledgments
This work has been supported by the National Natural Science Foundation of China (51575102), Six talent peaks project in Jiangsu Province (JXQC-003), and Fundamental Research Funds for the Central Universities of China(2242017K40112).
