Abstract
Fault diagnosis is of great significance for industrial equipment maintenance, and feature extraction is a key step of the entire diagnosis scheme. The symbolic aggregate approximation (SAX) is a popular feature extraction approach with great potential recently. In spite of the achievements the SAX has made, the adverse information aliasing still exists in its calculation procedure, and it may make the SAX fail to guarantee the information correctness. This work focuses on analyzing the information aliasing phenomenon of the SAX, followed by developing a novel alternative method, i.e. parallel symbolic aggregate approximation (PSAX). In the proposed PSAX, the information aliasing is suppressed by designing anti-aliasing procedure, and the average of the symbolic results of several intermediate sequence is adopted to replace the final symbolic result. The Case Western Reserve University (CWRU) rolling bearing data together with the gas valve data of an actual reciprocating compressor assist in verifying the superiority exhibited by the suggested method. The experimental results show that, compared with other methods, the accuracy advantage of the PSAX on the 2 datasets can reach 1% –5%, indicating it is capable of providing high-quality feature vector for intelligent fault diagnosis.
Keywords
List of abbreviations and symbols
Nomenclature
Smacr;ymbolic aggregate approximation
Parallel symbolic aggregate approximation
Case Western Reserve University
Wavelet transform
Empirical mode decomposition
Local mean decomposition
Variational mode decomposition
Compressed sensing
Piecewise aggregate approximation
Normal
Spring damage
Valve plate gap
Valve plate fracture
Back propagation
Lempel-Ziv
Multi-scale entropy
Composite multi-scale entropy
Deep belief network
convolutional neural network
Sampling frequency
Mean value
Standard deviation
Symbolic scale
Anti-aliasing link
Introduction
Various mechanical equipment are widely used in all fields of production and life [1, 2]. Nevertheless, for large complex equipment, it is not easy to ensure its long term fault-free operation [3–5]. Once failures occur, it will not only cause unplanned production downtime, but also cause casualties in serious cases [6–9]. On that account, effective and progressive fault diagnosis technology shall be explored in the process of industry 4.0 to enhance the equipment operation reliability and safety [10–14].
As industrial sensor technology rise rapidly, the information generated by the equipment in enterprise is becoming more and more accessible [15]. In a variety of signal types, the vibration acceleration signal is always considered to be the most reliable and widely selected signal type, for it enjoys the convenience of sensor installing and can implement the signal acquisition strategy without production interruption [16]. The fault-related features are extracted from the collected vibration signals, which is the key to achieving precise fault diagnosis, and previous studies have introduced different advanced approaches into fault diagnosis, including the wavelet transform (WT) [17–19], the empirical mode decomposition (EMD) [20, 21], the local mean decomposition (LMD) [22], the variational mode decomposition (VMD) [23, 24], the compressed sensing (CS) [25] and the entropy-based technologies [26–28]. The above technologies are widely used in gearbox fault diagnosis (Such as cracked tooth, distributed wear and misalignment of axis, etc.) and rolling bearing fault diagnosis (Such as inner/outer ring pitting and rolling element failure, etc). Although these methods have achieved certain achievements, problem remains. For those time-frequency domain methods (WT, EMD, LMD and VMD), the non-stationary characteristics [29] of mechanical vibration signal is often the most important factors that hinder their further promotion. In addition, these methods also suffer from some self limitations, such as algorithm adaptability, endpoint effect and modal aliasing. For the entropy-based methods, despite the slightly reduced influence exerted by non-stationary characteristics, the computational efficiency is often not satisfactory. On the other hand, with the rise of artificial intelligence, deep learning has been applied to many fields [30]. Although it has made certain achievements in fault diagnosis, it is easy to be affected by the attribute of small sample.
Recently, with the continuous progress of feature extraction technology in mechanical fault diagnosis, the concept of time series symbolization in data mining field has gradually received attention from scholars [31]. Compared with other common feature extractions, the time series symbolization technology is capable of comprehensively reflecting the data overall characteristics, and is less restricted by the intractable non-stationary characteristics. As a typical representative of the time series symbolization approaches, George Georgoulas, et al. [32] for the first time introduced the SAX into the fault diagnosis of rolling bearing, and in subsequent studies, scholars have analyzed and applied the SAX from different perspectives. For example, literature [33] combines the SAX with deep learning classifier, by using the SAX to implement dimension reduction, the calculation efficiency of the diagnosis scheme can be improved on the premise of ensuring accuracy; literature [34] simultaneously extends the SAX from the basic time domain to the frequency domain and the time-frequency domain, which improves the information comprehensiveness and is successfully applied to the bearing fault diagnosis as well as reciprocating compressor; literature [35] employs the SAX for improving the Lempel-Ziv indicator coding efficiency, which achieves considerable effects for the fault diagnosis regarding rolling bearing.
Despite that certain efforts of scholars have promoted the development of the SAX approaches, some problems have not received attention. Specifically, in the calculation process of the SAX, in order to achieve the purpose of dimension reduction, the piecewise aggregate approximation (PAA) is employed to process the standardized data. However, the way of averaging in PAA does not take the frequency characteristics of the data into account, which is easily to cause the distortion of the original information, and the information aliasing phenomenon will greatly reduce the reliability and performance of the entire SAX algorithm.
Motivated by the above defects, this paper first analyzes the information aliasing in the SAX in terms of signal processing, followed by developing a new alternative version of SAX, i.e. parallel symbolic aggregate approximation (PSAX) for time series symbolization. In the proposed PSAX, an anti-aliasing link is designed according to the symbolic scale for ensuring the information correctness. Furthermore, the final output symbolic results of the original SAX is replaced by the average of the symbolic results of several intermediate series, which can fully take the impact characteristic of the mechanical vibration signal into consideration. In the experimental part, to verify the validity of the proposed method, the paper takes into account the public rolling bearing data of CWRU and the gas valve data of an actual reciprocating compressor. As revealed by the experiment, relative to the original SAX and other common fault indicators, the proposed PSAX is capable of providing better feature vectors for the intelligent fault diagnosis.
The main contributions of this paper are summarized as follows. The information aliasing in the SAX is analyzed from the perspective of signal processing, and then a novel symbolic feature extraction approach named PSAX is developed. In the proposed PSAX, the information aliasing is suppressed by designing anti-aliasing procedure, and the average of the symbolic results of several intermediate sequence is adopted to replace the final symbolic result with the aim of guaranteeing information correctness. The CWRU rolling bearing data and a set of reciprocating compressor gas valve data are employed to verify the effectiveness of the suggested method, and the results show that the method of this paper can provide high-quality feature vector for intelligent fault diagnosis.
The rest of this paper falls into 5 sections. Section 2 gives the theoretical background of the SAX. Section 3 provides the frequency characteristic analysis, including the information aliasing phenomenon existing in the SAX. Section 4 displays the proposed PSAX method. The experimental verification of the proposed method can be found in Section 5. The final section is the conclusion, together with the future work.
Brief review of the SAX
The SAX is a proper sequence dimension reduction method, and can effectively help to process the vibration signals of industrial equipment. Specifically, relying on the SAX, a digit sequence with length n can be converted into a symbol sequence with length m (m < n), and the SAX is calculated following the steps below [31–35]:
(1) Specific to a given time series X, the first step is to standardize it into a new series Y = {y1, y2 ⋯ y
n
} following Equation (1).
Where, x i represents the i th value in the raw series X, u x denotes its mean value and δ x denotes its standard deviation.
(2) After the standardization procedure, the obtained series Y is divided into [n/τ] sub series with length τ by the PAA technology, followed by the calculation of the average of each sub series for composing a new series Z = {z1, z2 ⋯ z[n/τ]}.
(3) In this step, each element in sequence Z will be represented by a symbol according to its value on the y-axis. In fact, as the standardized sequence follows the Gaussian distribution, the y-axis is allowed to be split into several regions, and each region has the same area. The elements falling in the corresponding region will be given the same symbol. Table 1 lists the breakpoints to divide these regions.
Breakpoints of the SAX
To explain the SAX more intuitively, Fig. 1 illustrates the SAX method. In Fig. 1, the length of the raw data is 100, and the scale of the symbolic procedure is 5. Therefore, the raw series is first divided into 20 sub sequences, and the average of each sub sequence is obtained representing the information regarding the corresponding sub sequence. In the symbolic stage, the vertical axis falls into 5 equal area (In the code of a, b, c, d and e, respectively) according to the breakpoints (B1 = –0.84, B2 = –0.25, B3 = 0.25, B4 = 0.84). After symbolization, the raw sequence is represented as “debadebadebadebadeba”.

Sketch map of the SAX.
As can be seen from the above calculation steps and example, the SAX can effectively convert time series into strings. However, in the calculation process (especially step 2), the way of averaging in PAA does not take the frequency characteristics of the data into account, and it is easy to cause the distortion of the original information, which has not received attention in the existing research. Motivated by the above defects, the following sections will analyze the information aliasing phenomenon in detail and put forward the modified version.
Down-sampling equivalence of the PAA transform
As described in section 2, the calculation process of the SAX mainly includes three steps, among which, the PAA plays an important role. During the PAA procedure, we first divide the original sequence into several sub sequences of the same length, followed by obtaining the average of each sub sequence to compose a new sequence. In this way, the dimension of the data can be effectively reduced, but the change regarding signal frequency characteristics is seriously ignored, easily leading to information aliasing. In fact, specific to a certain sequence X = {x1, x2 ⋯ x n }, it is allowed to equal the above transformation to a down-sampling procedure by the following steps [9, 36]:
(1) When the symbolic scale (the length of the sub-sequence) of the SAX is determined as τ, the raw data can be first split into τ intermediate sequences as formula. (3).
Where, N0 represents the largest integer not greater than N and can be divided by τ.
(2) Because the length of each intermediate sequence in formula. (3) is N0 - τ + 1, the elements at the corresponding positions of each sequence can be added, and then a new sequence G will be generated. After dividing each element in sequence G by τ, the obtained sequence is actually the same as the one obtained by PAA directly.
Assuming that the raw data is N0 long and the corresponding sampling frequency is fs, that is to say, the time interval between every two adjacent elements in the sequence is 1/fs. Meanwhile, according to formula. (3), in the intermediate sequence, the time interval between the adjacent elements is τ/fs, that is, the intermediate sequence can be regard as a signal of which the sampling frequency is fs/τ. In addition, when adding the corresponding elements of the intermediate sequence and dividing them by τ, because the physical process represented by each intermediate sequence is basically the same, the whole process is some what similar to the time-domain synchronization average [37]. Although random noise can be suppressed to a certain degree, the sampling frequency of the signal will not be changed. It indicates that the sampling frequency of the series obtained by the above steps (As well as the PAA series) is also fs/τ. Therefore, it is allowed to regard the PAA procedure in the SAX algorithm as a signal down-sampling process. Setting symbolic scale τ = 2, Fig. 2 presents the sketch map regarding the down-sampling equivalence.

Sketch map of the down-sampling equivalence. (symbolic scale τ = 2).
In mechanical fault diagnosis, it is important to analyze signal frequency characteristics. The occurrence of fault often leads to the change of specific frequency components in the signal. Therefore, the intensity of specific frequency components can be used for judging the occurrence of the fault. Even for the method which do not takes the specific frequency as the diagnostic basis, the information correctness shall also be ensured during the calculation, as it is the basic premise of all analysis and transformation. However, in the PAA procedure (Down-sampling process), the occurrence of information aliasing will make the SAX fail to ensure the information correctness.
Based on Fig. 3(a) and (b), specific to band-limited signals, with the highest frequency in the signal smaller than or equal to f s /(2 * τ), the PAA sequence and the intermediate sequence are capable of accommodating all frequency components in the signal (The bandwidth of the PAA sequence and the intermediate sequence is reduced to f s /(2 * τ)). Besides, compared with the PAA series, the frequency domain characteristic of the intermediate sequence more closely meets the raw data, and the reason for the slight distortion of the PAA sequence may be that the averaging operation in the PAA procedure weakens the impact characteristics of the raw signal.

Frequency domain characteristics regarding the sequences before and after transformation (τ = 2).
Based on Fig. 4(a) and (b), with the highest frequency larger than f s /(2 * τ), both the PAA sequence and the intermediate sequence are not capable of accommodating all frequency components in the raw signal due to the reduction of bandwidth. Specifically, the frequency component from f s /(2*τ) to f s /2 will be removed form the current bandwidth. More serious, the frequency component removed above may be mixed into the current frequency band in other forms, resulting in adverse information aliasing.

The time domain waveform and the spectrum of different series (τ = 2).
To illustrate the aliasing phenomenon mentioned above more clearly, a simulation signal is set to be y (t) = sin(2*π*f1 * t) + sin(2 * π * f2 * t). In the simulation process, f1 = 10 Hz, f2 = 150 Hz, and the sampling frequency is set to be fs = 500 Hz. Figure 5 gives the time domain waveform as well as the spectrum regarding different series.

Frequency domain characteristics of the sequences before and after transformation (τ = 2).
As demonstrated in Fig. 5, faultless recovery of frequency components of 10 Hz and 150 Hz can be achieved when the sampling frequency is 500 Hz, and there exists no other interfering components. However, for the intermediate sequence and the PAA sequence, due to the reduction of the sampling frequency, the bandwidth of the spectrum becomes 125 Hz. In this case, the spectrum can only continue to accommodate the component of 10 Hz, and the frequency component of 150 Hz is disappeared from the spectrum of the intermediate series as well as the PAA series. More seriously, a new component of 100 Hz appears in above spectrum, but it is not the real information contained in the original signal. Through the above analysis, it can be seen that during the calculation process of PAA, it is possible to abandon the real information and generate false information. Consequently, the PAA is incapable of ensuring the original information correctness.
In mechanical fault diagnosis, frequency components of the vibration signal collected from the surface of the equipment usually distributes all over the frequency band. Besides, during the SAX processing, the bandwidth will be compressed to 1/τ of the original according to the parameter τ of the SAX algorithm, which means that when τ is a relatively large value, the components distributed in a relatively wide frequency band will be compressed to a relatively narrow frequency band and lead to more serious information aliasing. Therefore, when dealing with the mechanical vibration signals, the PAA procedure, as well as the SAX, are possibly incapable of guaranteeing information correctness in most cases.
To overcome the previously mentioned limitations of the SAX, this section proposed the parallel symbolic aggregate approximation (PSAX). In the developed PSAX, the information aliasing is suppressed by designing anti-aliasing link, and the average of the symbolic results of several intermediate series replaces the final symbolic results, that gives a full consideration of the impact characteristics of the signal. Figure 6 shows the PSAX calculation flow chart, and below gives the detailed steps:

Flow chart of the developed PSAX.
Step1: For a determined sequence X = {x1, x2 ⋯ x n }, the first step is to standardize it into a new series Y = {y1, y2 ⋯ y n } according to Eq. (1), which follows the same step as the original SAX algorithm. Its function is to convert the raw sequence into a sequence with mean value 0 and standard deviation 1 to meet the requirements of the subsequent calculations.
Step2: After the standardization procedure, the new series Y will be split into τ parallel intermediate sequences according to Eq. (4), its function is to remove the aliasing information and to avoid the weakening of signal impact characteristics by mean operation to a certain extent.
Where, filter (·) is the anti-aliasing link, and τ = 1, 2, ⋯ , N represents the symbolic scale.
Step3: In this step, the elements in each intermediate sequence will be replaced by a symbol according to Table 1, and finally τ symbol sequences can be generated. It is worth noting that, for each intermediate sequence, the symbolization process is actually the same as that of the original SAX, and the purpose of this step is to achieve the symbolic representation of the data.
Step4: After the symbolization, the number of those symbols in each intermediate sequence is counted, and the statistical results of all the intermediate sequences will be averaged according to Eq. (5) to computing the final output of the developed PSAX. The effect of this step is to combine multiple symbol sequences into one and further transform it into the feature vector available to the classifier.
In this section, to verify the effectiveness of the proposed method and avoid the occasionality of a single data set, two data sets are employed, including the CWRU rolling bearing data set and an actual reciprocating compressor gas valve data set.
Case 1
Experimental setup
The section employs the rolling bearing experimental data regarding CWRU for verifying the effectiveness of the developed PSAX [38]. As shown in Fig. 7, the experimental platform mainly consists of a motor, a torque transducer (encoder), a dynamometer and electronic control equipment, et al. The experiment sets the sampling frequency of vibration signal at 12 kHz, and the data used in this section includes 7 states (3 fault states under the same working condition and 4 normal states under different working conditions). Table 2 lists the related details of the experimental data, and the detailed steps of the experimental process are as follows: Setting the length of a single sample to 2048 to intercept the experimental signal, and then the obtained samples are randomly divided into the training set and the test set. Setting appropriate symbolic scale and character number, and utilizing the proposed PSAX to convert the training set and test set data into symbol sequences for feature extraction. Establishing the BP classification model. Training the model with the training set, and the trainable parameters of the model will be frozen when the model converges. The test set data is fed into the model to obtain diagnostic results and evaluate model performance.

The rolling bearing test stand of CWRU.
The details of the CWRU experimental data
To further verify the aforementioned information aliasing phenomenon and the difference in frequency characteristics of different sequences, this section takes symbolic scale τ = 3 as an example to compare the frequency characteristics of the original signal and that of the PAA sequence. It is worth noting that in the process of analysis, only one normal state (NOR4) and three fault states are taken into consideration.
Figure 8 presents the spectrum of the original signal (0–2000 Hz) and PAA sequence. As illustrated in Fig. 8, since the symbolic scale of the SAX is 3, which is equivalent to using a proportion of 1/3 to down sample the raw signal, the actual sampling frequency of the obtained PAA sequence will be 12 kHz/3 and the spectral width is 2 kHz. In the above frequency band range, the spectrum regarding the original signal and corresponding PAA sequence under different states show obvious differences. Specific to the NOR4 state, the spectrum of PAA sequence has significant spectral peaks near 1900 Hz. However, the spectrum of the original signal has almost no energy near 1900 Hz. It indicates that the reduction of sampling frequency leads to the occurrence of information aliasing in the process of PAA transformation, which makes the SAX method fail to ensure information correctness. Besides, for the states of IRF, BF and ORF, the spectrum regarding the original signal and the PAA sequence also show obvious differences in the range of 1000 to 1700 Hz, 500 to 800 Hz, and 1200 to 1600 Hz, respectively. It once again proves that in the SAX approach, such PAA procedure may cause information aliasing and generate false information, which can not guarantee the reliability of the subsequent calculations in the fault diagnosis schemes

Comparison of the frequency characteristics of the original sequence and the PAA sequence.
Figure 9 gives the spectrum regarding the intermediate sequences of the proposed PSAX. As shown in Fig. 9, since the symbolic scale τ = 3, the raw signal will be split into 3 parallel intermediate sequences after the anti-aliasing processing, and the actual sampling frequency of those intermediate sequences is 4 kHz (The corresponding spectrum width is 2 kHz). On the whole, the morphology of spectrum regarding those intermediate sequences is very similar. More importantly, different from the spectrum of the original PAA sequence, above intermediate sequence and the original signal show quite similar spectrum morphology in the range of 0–2000 Hz, indicating the ability of the PSAX proposed in this paper to hinder the information aliasing as well as ensure information correctness during the calculation process.

Frequency characteristics of parallel intermediate sequences obtained by the PSAX.
The section compares the feature vectors extracted by the original SAX and the developed PSAX. During the calculation, the length of a single sample is set to 2048 and the symbolic scale is τ = 3. In addition, during symbolization, for enhancing algorithm accuracy, we divide the y-axis into 8 equal-area regions according to Table 1, corresponding to the symbol a, b, c, d, e, f, g and h, respectively. The statistical results regarding the symbol number in each bearing status are shown in Fig. 10.

The statistical results of the number of the symbols.
As presented in Fig. 10(a), for the SAX feature vector, the normal states and the fault states exhibit pronounced differences. However, for those normal states, the number of different symbols is nearly evenly distributed, and the difference between different states is not prominent. In addition, for the 3 fault states, the output symbols are mainly concentrated in d and e, which may seriously reduce the role of other symbols in the final feature vector. Meanwhile, as observed in Fig. 10(b), for the PSAX feature vector, the difference remains prominent. Although the distribution of different symbols in those normal states is still relatively balanced, the difference between different states is significantly increased, making it easier to distinguish different states of bearings. In addition, for those fault states, the difference in symbol number is reduced to a certain extent, which contributes to the generation of high-quality feature vector. From the above analysis it can be seen that, the PSAX feature vector may have more potentials than the SAX feature vector, for the distribution of the eigenvalues in the PSAX feature vector is more reasonable, and the observable difference between different states is also more obvious.
For further verifying the effectiveness of the proposed method, this section sets the enhanced SAX [31] and some other common features, such as time domain statistical parameters [39], entropy-based methods [40, 41] and Lempel-Ziv (LZ) indicator [35] as contrast methods. Besides, some classical deep learning approaches, such as deep belief network (DBN) [42] and convolutional neural network (CNN) [43] are also considered. For the time domain statistical parameters, this work mainly considers kurtosis, margin, waveform factor and pulse factor. For the entropy-based approach, this work mainly considers the multi-scale entropy (MSEn) as well as the composite multi-scale entropy (CMSEn) under scale factor 0 to 8. For the DBN, it is mainly composed of 3 RBMs, and the number of the hidden units in each RBM is 50, 100 and 50 respectively. For the CNN, it consists of 3 Conv-ReLU-MaxPooling basic units, the number, size and step size of convolution kernel in the convolution layer are respectively 16, 2, 2, and the size and step size of the pooling region in MaxPooling layer are both 2. In addition, the DBN and the CNN directly utilize the raw data as model inputs.
In addition, in the aspect of intelligent diagnosis classifier, this paper directly utilizes the neural network toolbox in MATLAB R2019a to construct a BP network with only one hidden layer (50 neurons). It is worth noting that the purpose of this BP classifier is not to obtain the highest diagnostic accuracy, but the feature vector performance is highly valued. The toolbox will automatically stop the training process when the loss of the validation set is on the rise. The experiment set the length of a single sample at 2048, and the obtained data set contains a total of 280 samples (7 categories, and 40 samples for each category), using 50% as the training set, 15% as the verification set, and the remaining 35% as the test set. The BP classifier training process is presented in Fig. 11, and the average results (10 trials) of different methods are given in Table 3.

Training process of the BP classifier.
Comparison of different methods
As shown in Table 3, in all the comparison methods, the performance of the time-domain statistical parameter, the LZ indicator, the original SAX and its enhanced version is relatively mediocre as a whole, and they do not show obvious advantages in both diagnosis accuracy and computation time. Compared with the above methods, the entropy-based method and the proposed PSAX can achieve higher diagnostic accuracy, but the entropy-based method is obviously at a disadvantage in terms of calculation time. Besides, due to the small amount of data, the performance of the deep learning methods is also not prominent. On the whole, the method of this paper enjoys obvious advantages in both accuracy and computational efficiency. In addition, the confusion matrix presented in Fig. 12 shows that for the SAX and the PSAX, the misdiagnosed samples mainly appears between category 3 (NOR3) and category 6 (BF), and it is also the main reason that limits the further improvement of the diagnostic accuracy. Figure 13 shows the influence of the PSAX hyperparameters on diagnostic accuracy. As shown in Fig. 13, with the increase of character number and symbolic scale, the diagnostic accuracy tends to decline on the whole. This may be because that with the increase of character number (In this case, the symbolic scale is a fixed value of 3), the resolution of symbolization will decline, leading to the blurring of the differences in different states. On the other hand, the increase of symbolic scale (In this case, the character number is a fixed value of 8) will shorten the symbol sequence, which also affect the diagnostic accuracy to a certain extent. In addition, compared with the symbolic scale, the character number has more obvious influence on diagnostic performance.

The test confusion matrix.

The influence of hyperparameters.
Experimental setup
The data used in Case 2 is collected from an actual industrial reciprocating compressor. The reciprocating compressor is composed of the crankcase, the cylinder, the crosshead and the motor (Figs. 14 and 15). The second stage cylinder inlet pressure is 310 kpa (The temperature is about 32°C), while the outlet pressure is 1040 kpa (The temperature is about 104°C). During data acquisition, the system sampling frequency is 20 kHz. In addition, the tested gas valve is installed at the 4 g position of the secondary cylinder. Vibration signals in different states, including the normal state (NOR), the spring damage (SD), the valve plate gap (VPG) and the valve plate fracture (VPF) are collected.

The reciprocating compressor and the acquisition instrument.

Structural diagram of the reciprocating compressor system.
The experiment in the section sets single sample length as 2048, and obtains 152 samples in total, including 38 samples in each state. Figure 16 and Table 4 present the the time-domain waveforms regarding the vibration signals and the details of the data set respectively.

Time-domain waveform of the reciprocating compressor vibration signal.
The details of the reciprocating compressor data
During the symbolization process, we set the symbolic scale τ = 3 and divide the y-axis into 8 equal-area regions according to Table 1. As for the section of the classifier, it is still a BP neural network with a single hidden layer (50 neurons), which is the same as that in Case 1. 50%, 15% and 35% of the data are treated as the training set, the validation set and the test set, respectively. Figure 17 presents the statistical results of the number of the symbols in each gas valve status, and Table 5 lists the average results (10 trails) regarding various methods.

The statistical results of the number of the symbols.
Comparison of different methods
As presented in Fig. 17 and Table 5, the SAX output symbols are mainly concentrated in d and e, while the number of symbol a and h is relatively small. On the whole, the symbol distribution of the data under different states is roughly the same, and the difference between each state is not obvious. Compared with the SAX results, although the symbols output by the developed PSAX present a similar overall distribution, the differences between the 4 states become more prominent. In terms of diagnostic accuracy and computational efficiency, the PSAX proposed in this paper is still capable of maintaining certain comprehensive advantages, once again verifying the effectiveness exhibited by the proposed method.
In addition, as shown in Fig. 18, in the experiment of Case 2, the overall influence trend of the hyperparameters on diagnostic performance is similar to that of Case 1, and the influence of the character number is still more significant.

The influence of hyperparameters.
The paper focuses on detailedly analyzing the information aliasing in the SAX procedure in terms of signal processing, and develops a novel alternative version, i.e., parallel symbolic aggregate approximation (PSAX). In the suggested PSAX, an anti-aliasing link is designed according to the symbolic scale for ensuring the information correctness, and the average of the symbolic results of several intermediate series replaces the final output symbolic result according to the frequency characteristic of the data, which can fully take the impact characteristics of the vibration signal into consideration while ensuring the information correctness. The rolling bearing data set of CWRU together with an actual reciprocating compressor data set help to verify the superiority exhibited by the proposed PSAX. Based on the experiment, for the 2 data sets, the diagnostic accuracy of the suggested method is respectively 94.21% and 93.54%, and the advantages over other contrast methods are about 1% –5%. In addition, as a new time series dimension reduction algorithm, the PSAX is expected to be further applied in character recognition, dangerous driving behavior detection and astronomical research, etc. fields. Future research shall pay attention to the design of the anti-aliasing link and the more refined y-axis partition mode during the symbolization process.
Declaration of competing interest
The authors declared that they have no conflicts of interest to this work.
Footnotes
Acknowledgments
This work is supported by the National Natural Science Foundation of China under Grant (52075310).
