A novel image classification model based on adversarial training for pulsar candidate identification

Abstract

Pulsars are highly magnetized, rotating neutron stars with small volume and high density. The discovery of pulsars is of great significance in the fields of physics and astronomy. With the development of artificial intelligent, image recognition models based on deep learning are increasingly utilized for pulsar candidate identification. However, pulsar candidate datasets are characterized by unbalance and lack of positive samples, which has contributed the traditional methods to fall into poor performance and model bias. To this end, a general image recognition model based on adversarial training is proposed. A generator, a classifier, and two discriminators are included in the model. Theoretical analysis demonstrates that the model has a unique optimal solution, and the classifier happens to be the inference network of the generator. Therefore, the samples produced by the generator significantly augment the diversity of training data. When the model reaches equilibrium, it can not only predict labels for unseen data, but also generate controllable samples. In experiments, we split part of data from MNIST for training. The results reveal that the model not only behaves better classification performance than CNN, but also has better controllability than CGAN and ACGAN. Then, the model is applied to pulsar candidate dataset HTRU and FAST. The results exhibit that, compared with CNN model, the F-score has increased by 1.99% and 3.67%, and the Recall has also increased by 6.28% and 8.59% respectively.

Keywords

Generative adversarial nets convolutional neural network unbalanced dataset pulsar candidate identification

1 Introduction

Pulsars are highly magnetized, rotating neutron stars that emit a beam of electromagnetic radiation. The discovery of pulsars is of great significance in the fields of physics and astronomy. The Five-hundred-meter Aperture Spherical Telescope(FAST), known as the “Chinese Sky Eye”, is located in Pingtang County, Guizhou Province, China and was completed on September 25, 2016. Discovering pulsars is one of FAST’s most important scientific goals. Millions of pulsar candidates are produced when the raw data are processed via the pipeline system, such as Pulsar Exploration and Search Toolkit(PRESTO). Pulsar candidate datasets are characterized by extreme imbalance and lack of positive samples, because the number of pulsars discovered is limited, while interference signals are widespread. For example, there are 1196 positive samples and 89996 negative samples in High Time Resolution Universe(HTRU) dataset, with the imbalance ratio of approximately 75: 1. In recent years, with the development of artificial intelligence(AI), deep convolutional neural network(CNN) [1 –3] have been increasingly employed to identify pulsar candidates. However, these methods exhibit poor performance and inevitably suffer from model bias. Rebalancing dataset is the most frequent and convincing strategy for handling imbalanced dataset. Synthetic minority over-sampling technique(SMOTE) [4] is one of the classic methods and works well on low-dimensional imbalanced datasets. However, it is not suitable for image datasets, partly because it is based on the k-Nearest Neighbor(KNN), and also because the synthesized samples are convex combinations of existing samples. Recently, with the emergence of generative adversarial nets(GANs) [5], ones have been exploring the application of the model for over-sampling [6 –8].

Generative adversarial nets, which exhibits superiority in generating visually sharp images and feature representation learning, consists of a generator(G) and a discriminator(D). The generator takes random noise z as input and outputs samples G ( z ), the discriminator is a binary classifier, which tries to distinguish between the generated samples and the real samples. Both G and D are parameterized neural networks and play the following two-player minmax game:

$\begin{matrix} min_{G} max_{D} {E_{x \sim p (x)} [log D (x)] \\ + E_{z \sim p_{z} (z)} [log (1 - D (G (z)))]} \end{matrix}$ (1) where p ( x ), p_z ( z ) are the distribution of data and the prior distribution of latent variable, respectively. When the model reaches equilibrium, the generated distribution is equal to the real distribution, and the generator produces samples that can’t be distinguished by discriminator. GANs is extended to the field of supervised learning [9, 10], where the labels y are fed into the generator to produce controllable samples. Meanwhile, it is applied to semi-supervised learning [11 –13] to alleviate the dilemma of insufficient diversity of labels. Similarly, GANs can be adopted to small-sample datasets and unbalanced datasets.

Two perspectives can be considered when applying GANs to the classification of unbalanced datasets. One strategy is to train the conditional GANs(CGAN) to rebalance the dataset before adopting CNN for classification; the other is to train the CGAN and CNN simultaneously under a unified framework. The former is widely adopted in various unbalanced image datasets [14, 15]. However, it also suffer from two thorny issues, which are as follows: Firstly, it is not an end-to-end learning model. Ones must determine the convergence of CGAN model according to their experience, so the results are difficult to reproduce; Secondly, the samples generated by CGAN concentrate on the center of the real distribution, which is of little significance for improving the diversity of training samples. To this end, we adopt the second strategy and present a general image classification model based on adversarial training, abbreviated as ICAT. it is an end-to-end learning model which contains a generator, a classifier, and two discriminators. During the training, the generator and the classifier cooperate with each other to achieve equilibrium. Meanwhile, the labels that are adopted for generating images are restored when the images are fed into classifier. Theoretical analysis demonstrates that the model has a unique optimal solution and the classifier happen to be the inference network of the generator, which means, for conditional generation samples, the prediction label of the classifier is exactly the same as the input label of the generator. When the model reaches equilibrium, the generator can produce completely controllable samples, and the classifier can predict labels for the generated samples and unseen samples. The main contributions of this paper are rendered in the following three aspects.

(i) A general image classification model with adversarial training is proposed, and the convergence of the model is demonstrated theoretically.

(ii) 10,000, 20,000, 30,000 and 40,000 samples are extracted from MNIST for model training, and classification and generation experiments are conducted. The results show that, compared with the CNN model, the recognition accuracy of ICAT is improved by 0.16%, 0.12%, 0.09%, 0.08% and 0.07%, respectively. In addition, classification on the generated samples, with "perfect" classifier, indicates that the ICAT behave better controllability than CGAN and ACGAN.

(iii) The model is applied to the unbalanced pulsar candidate datasets HTRU and FAST. The results exhibit that, compared with CNN model, the F-score has increased by 1.99% and 3.67%, and the Precision has also decreased by 2.75% and 3.99% respectively. Meanwhile, the recognition performance of ICAT is also better than that of CGAN+CNN model.

The remainder of this paper is organized as follows: Section 2 introduces the development of pulsar candidate identification algorithms and the application of GANs on unbalanced datasets. Section 3 outlines the proposed model and the theoretical analysis. Section 4 presents the results and corresponding discussions. Section 5 ends this paper with concluding remarks.

2 Related works

The methods of pulsar candidate recognition can be divided into two categories namely artificial recognition algorithm and machine learning recognition algorithm. The artificial recognition mainly includes the classification algorithm based on statistical information such as signal-to-noise ratio [16], the scoring and sorting algorithm based on statistical feature [17] and the image software-assisted classification algorithm [18]. As such method is an unsupervised, no labels are required. However, the features of the sample need to be designed and weights must be assigned for each feature, which greatly depends on the professional knowledge and experience of the researchers.The machine learning method that has emerged in recent years mainly includes artificial neural network (ANN) algorithms based on empirical features and data-driven image recognition algorithms. In 2010, Eatough et al. [19] designed 12 sample features and adopted a three-layer ANN model for training and testing; In 2012, Bates et al. [20] added another 10 statistical features for ANN training. In 2014, Morello et al. [21] optimized the ANN model and presented the SPINN method(Straightforward Pulsar Identification using Neutral Networks). The application of ANN model greatly improves the accuracy and processing speed of pulsar candidate classification. However, it is based on experience and certain assumptions, and the selection of features is also strongly dependent on the data set. In 2014, Zhu et al. [22] designed novel artificial intelligence program that identifies pulsars by using image pattern recognition with deep convolutional neural networks-the PICS(Pulsar Image-based Classification System). Different from the recognition method based on ANN, the system directly takes candidates as input. Therefore, no sample features need to be designed. In 2018, Wang et al. [23] proposed to replace the CNN in the PICS with ResNet[24] and retrained the system on the FAST dataset. The GANs model is also considered for the recognition of pulsar candidate[25], where the discriminator of GANs is employed as a feature extractor. A similar design appeared in [26], where the ANN model was applied for feature extraction. These methods have improved the accuracy of pulsar candidate identification, but do not solve the problem of model deviation caused by the imbalance of the dataset and the lack of the diversity of positive samples.

The classification of imbalanced data has always been a hot topic in the fields of machine learning and artificial intelligence, and is widely found in biomedical [27, 28], financial [29] and information security [30]. The research of imbalanced data classification mainly focuses on data preprocessing and classifier construction. Data preprocessing is to rebalance the dataset by resampling the raw data, which mainly includes data under-sampling, over-sampling and mixed sampling. In terms of classifier construction, the structure of the classifier is changed to obtain a higher classification accuracy for majority samples and the minority samples. These methods have been successful on low-dimensional imbalanced data, but suffer from limitation on high-dimensional data. With the development of artificial intelligence and deep learning, generative adversarial nets(GANs) have been increasingly adopted for the classification of unbalanced data. At present, the general workflow is to employ the CGAN model for data rebalancing before adopting the CNN model for classification. CGAN has achieved clear advantage in the classification of high-dimensional unbalanced dataset due to its unique ability to generate realistic samples. However, it also suffers from certain limitations, which are as follows: 1. The samples generated by CGAN are mainly concentrated in the center of the data distribution, which limits the supplementation of data diversity; 2. the method is non end-to-end learning model, so the results are hard to reproduce and rely heavily on the researcher’s experience.

3 Method

3.1 Model presentation

The proposed model consists of a generator, a classifier and two discriminators. The architecture of the model is shown in Fig. 1, where p ( x ) is the distribution of real samples, p_z ( z ) and p_y ( y ) are the prior distribution of latent variable and the prior distribution of labels, respectively. It is worth noting that p_z ( z ) can be assumed to be Gaussian distribution or uniform distribution, while p_y ( y ) must be the label distribution of the training samples. All the nets are parameterized neural networks, and discriminators share the same parameters except for the last layer, which greatly reduces the number of parameters. Three types of joint distribution p ( x , y ), p_g ( x , y ), p_gc ( x , y ) between samples and labels are involved in the model for adversarial training, where p ( x , y ) is the joint distribution between real samples and labels; p_g ( x , y ) is joint distribution between labels and correspondingly generated samples; p_gc ( x , y ) represents the joint distribution between generated samples and corresponding prediction labels.

Fig. 1

Structure of the proposed ICAT model.

The generator G is similar to the generative model in CGAN, which takes the latent variable z and the label y as inputs, and outputs the generated samples G ( z , y ). Let p_g ( x | y ) represents the distribution of conditionally generated samples, then

$p_{g} (x | y) = \int p_{z} (z) p_{g} (x | y, z) d_{z} .$ (2)

Therefore, the joint distribution p_g ( x , y ) is written as:

$p_{g} (x, y) = p_{g} (x | y) p_{y} (y) .$ (3)

The classifier C is similar to a CNN model, which takes the generated samples as input and outputs the corresponding prediction labels. Suppose p_c ( y | x ) is the distribution of prediction labels for the generated samples, then the joint distribution p_gc ( x , y ) is expressed as:

$p_{gc} (x, y) = p_{g} (x) p_{c} (y | x) .$ (4)

where p_g ( x ) = ∫p_g ( x | y ) p_y ( y ) d_y.

Discriminators D_i are binary classifiers with the same function as the discriminant model in the original CGAN. D₁ separates the joint distribution p ( x , y ) from p_g ( x , y ) and p_gc ( x , y ); D₂ distinguishes p_g ( x , y ) from p_gc ( x , y ). Let

$V (G, C, D_{1}, D_{2}) = \sum_{i = 1}^{3} V_{i},$ (5) where $\begin{matrix} V_{1} & = & E_{(x, y) \sim p (x, y)} [log D_{1}], \\ V_{2} & = & E_{(x, y) \sim p_{gc} (x, y)} [log (((1 - D_{1}) \cdot D_{2})], \\ V_{3} & = & E_{(x, y) \sim p_{g} (x, y)} [log ((1 - D_{1}) \cdot (1 - D_{2}))] . \end{matrix}$

Therefore, the objective function of the ICAT model is written as $min_{G, C} max_{D_{1}, D_{2}} V (G, C, D_{1}, D_{2})$ . Theoretical analysis shows that, when the model reaches equilibrium, the three joint distributions are equal, i.e. p ( x , y ) = p_gc ( x , y ) = p_g ( x , y ). On the one hand, because p ( x , y ) = p_g ( x , y ), we derive that the marginal distribution p_g ( x | y ) = p ( x | y ), which indicates that the model can produce controllable samples; On the other hand, factoring the joint distributions p_g ( x , y ), we get

$p_{g} (x, y) = p_{g} (x) * q_{g} (y | x) .$ (6) where q_g ( y | x ) is the inference network of generator p_g ( x | y ). Since p_g ( x , y ) = p_gc ( x , y ), we deduce ∀ x ∈ p_g ( x ) , q_g ( y | x ) = p_c ( y | x ), which illustrates that the classifier C happens to be the reverse network of generator G. During the training, the controllable samples produced by the generator are fed into the classifier to supplement the diversity of the samples. Meanwhile, the errors of the classifier are propagated back to the generator to improve the controllability of the generator. Therefore, the generation ability of the generator and the classification ability of the classifier in the model can be improved synchronously. When the model reaches equilibrium, the distribution of the generated samples is equal to the real distribution, i.e. p_g ( x ) = p ( x ). Therefore, the model can predict labels for test samples.

3.2 Model analysis

Theoretically, it can be proved that the model has a unique global optimal solution. Similar to GANs, the proof is divided into two steps. Firstly, fix the generator and classifier, maximize the objective function with respect to the discriminators. Secondly, substitute the discriminators and minimize the objective function with respect to the generator and classifier.

Proposition 1. For given G and C, the optimizer solution of objective function V (G, C, D₁, D₂) with respect to discriminators D₁, D₂ can be found at: $\begin{matrix} D_{1}^{*} = \frac{p (x, y)}{p (x, y) + p_{g} (x, y) + p_{gc} (x, y)}, \\ D_{2}^{*} = \frac{p_{gc} (x, y)}{p_{g} (x, y) + p_{gc} (x, y)}, \end{matrix}$

Proof: For fixed G and C, the objective function can be abbreviated as V′ (D₁, D₂). Then, the training criterion is to maximize V′ (D₁, D₂) with respect to D₁, D₂. Rewrite the objective function as:

$V^{'} (D_{1}, D_{2}) = \sum_{i = 1}^{3} V_{i}',$ (7) where $\begin{matrix} V_{1}' & = & \iint p (x, y) log D_{1} d_{x} d_{y}, \\ V_{2}' & = & \iint p_{gc} (x, y) log ((1 - D_{1}) \cdot D_{2}) d_{x} d_{y}, \\ V_{3}' & = & \iint p_{g} (x, y) log ((1 - D_{1}) \cdot (1 - D_{2})) d_{x} d_{y} . \end{matrix}$ The softmax function is utilized as the activation function for the last layer of the discriminators, then D_i ( x , y ) ∈ [0, 1]. Extract the integrated function from V′ (D₁, D₂) and take the first derivative with respect to D₁, D₂, we figure out the unique critical point $M (D_{1}^{*}, D_{2}^{*})$ . Moreover, It can be proved that $M (D_{1}^{*}, D_{2}^{*})$ is the maximum point of the integral function and also the maximum point of V′ (D₁, D₂).

Proposition 2. The global optimal solution of V (G, C, D₁, D₂) is achieved if and only if p ( x , y ) = p_g ( x , y ) = p_gc ( x , y ). At that point, $D_{1}^{*} (x, y) = \frac{1}{3}, D_{2}^{*} (x, y) = \frac{1}{2}$ and V achieves the value -3 log 3.

Proof: Let Δ = p ( x , y ) + p_g ( x , y ) + p_gc ( x , y ) and substitute D₁, D₂ with $D_{1}^{*}, D_{2}^{*}$ , then rewrite the objective function as: $\begin{matrix} U (G, C) & = max_{D_{1}, D_{2}} V (G, C, D_{1}, D_{2}) \\ = \sum_{φ \in S} E_{(x, y) \sim φ} [\frac{φ}{Δ}] . \end{matrix}$ (8) Where set S is {p ( x , y ) , p_g ( x , y ) , p_gc ( x , y )}. Then, convert to KL divergence: $\begin{matrix} U (G, C) & = - 3 log 3 + \sum_{φ \in S} KL [φ ∥ \frac{Δ}{3}] \\ ≜ - 3 log 3 + JS (p (x, y), p_{g} (x, y), p_{gc} (x, y)) . \end{matrix}$ (9) Where JS (p ( x , y ) , p_g ( x , y ) , p_gc ( x , y )) can be considered as an extended Jensen-Shannon divergence and is non-negative. JS (p (x) , q (x)) has global minimum zero if and only if p (x) = q (x). Therefore, U (G, C) has global minimum -3 log 3 if and only if p ( x , y ) = p_g ( x , y ) = p_gc ( x , y ), correspondingly $D_{1}^{*} (x, y) = \frac{1}{3}, D_{2}^{*} (x, y) = \frac{1}{2}$ .

3.3 Model implementation

During the training, in order to speed up the convergence of the model, label posteriori error ψ_l of real samples is introduced into classifier C, where

$ψ_{l} = E_{(x, y) \sim p (x, y)} [- {logp}_{c} (y | x)] .$ (10)

Minimizing ψ_l is equal to minimizing KL (p ( x , y ) ||p_c ( x , y ), where p_c ( x , y ) is the joint distribution of real data and corresponding prediction labels. Factoring p_c ( x , y ), we get p_c ( x , y ) = p ( x ) p_c ( y | x ). During training, p_g ( x ) is getting closer to the real distribution p ( x ). Then, the joint distribution p_c ( x , y ) gradually approaches p_gc ( x , y ) and KL (p ( x , y ) ||p_c ( x , y )) is equivalent to KL (p ( x , y ) ||p_gc ( x , y )). Finally, we come to the conclusion that minimizing ψ_l is consistent with optimizing the objective function of the ICAT model. Therefore, ψ_l will promote the convergence of the model. In fact, label posterior error of the generated sample also can be introduced to induce the convergence if needed. The training procedure is summarized in Algorithm 1.

Algorithm 1 Training procedure of the proposed ICAT model
Data: samples and labels
Output: Generator G and classifier C
Initialize parameters of generator θ _G, classifier θ _C,
discriminators θ _{D ₁}, θ _{D ₂}
Repeat
Sample a batch of labeled data pairs( x _i, y _i) ∼ p ( x , y ) ,
i = 1, 2, . . . N
Sample labels from prior distribution ${\bar{y}}_{i} \sim p_{y} (y)$
Compute conditionally generated samples ${\bar{x}}_{i} \sim p_{g} (x \| {\bar{y}}_{i}),$
Compute prediction labels for conditionally generated
samples ${\bar{y}}_{i}^{'} \sim p_{c} (y \| {\bar{x}}_{i})$
Calculate label posterior error ψ_l, ψ_g
δ₁₁ ← D₁ ( x _i, y _i), $δ_{12} \leftarrow D_{1} ({\bar{x}}_{i},$ ${\bar{y}}_{i})$ ,
$δ_{13} \leftarrow D_{1} ({\bar{x}}_{i}, {\bar{y}}_{i}^{'})$
$δ_{21} \leftarrow D_{2} ({\bar{x}}_{i}, {\bar{y}}_{i})$ , $δ_{22} \leftarrow D_{2} ({\bar{x}}_{i}, {\bar{y}}_{i}^{'})$
$L_{D_{1}} \leftarrow - \frac{1}{N} \sum_{i = 1}^{N} (log δ_{11} + log (1 - δ_{12})$
+ log(1 - δ₁₃))
$L_{D_{2}} \leftarrow - \frac{1}{N} \sum_{i = 1}^{N} (log (1 - δ_{21}) + log δ_{22})$
$L_{G} \leftarrow - \frac{1}{N} \sum_{i = 1}^{N} (log δ_{12} + log δ_{21})$
$L_{C} \leftarrow - \frac{1}{N} \sum_{i = 1}^{N} (log δ_{13} + log (1 - δ_{22})) + ψ_{l} + ψ_{g}$
θ _{D ₁} ← θ _{D ₁} - ∇ _{θ _{D ₁}}L_{D ₁}, θ _{D ₂} ← θ _{D ₂} - ∇ _{θ _{D ₂}}L_{D ₂}
θ _G ← θ _G - ∇ _{θ _G}L_G, θ _C ← θ _C - ∇ _{θ _C}L_C
Until Convergence

4 Experiments

The experiments are divided into two parts: The first is to investigate the classification performance and controllability of ICAT on the widely used MNIST dataset; The second is to apply the model to the pulsar candidate datasets HTRU and FAST to verify classification and generation ability on unbalanced dataset.

4.1 Datasets and evaluation metrics

MNIST [31] is a handwritten digit dataset, which consists of 50,000 training data, 10,000 cross-validation data and 10,000 test data, each of which represents a grayscale image with the size of 28 × 28. In the experiment, 10,000, 20,000, 30,000 and 40,000 samples are extracted from the training data for the model training.

A pulsar candidate(Fig. 2) contains several diagnostic plots, where the most important subplots highlighted are summed profile histogram, time-vs-phase plot, frequency-vs-phase plot and dispersion-measure(DM) curve. Here, two pulsar candidate datasets, HTRU and FAST, are chosen for experiment.

Fig. 2

Diagnostic plot of candidate.

HTRU is the first publicly available labeled benchmark pulsar candidate dataset, which contains 1,196 positive samples(pulsar) and 89,996 negative samples. The positive and negative samples on HTRU are shown in Fig. 3.

Fig. 3

Samples on HTRU dataset.

The 2D plot in the top is the frequency-vs-phase, abbreviated as sub-bands; The 2D plot in the middle is time-vs-phase, abbreviated as sub-int; The bottom one is the DM curve. There is a vertical stripe on both the sub-bands and the sub-int plot of the positive samples, which is the trace left by the pulse signal during the survey. When the signal is weak or the interference is strong, the vertical stripe may not be noticeable. In contrast, there are no vertical lines on the subplots of the negative samples. In the experiment, sub-int is adopted for model training. The dimension of the sub-int in the raw data is 18 × 64. Therefore, all images are uniformly resized to 64 × 64 and normalized to [0, 1]; then the dataset is divided into training set, validation set and test set. Table 1 lists the number of positive and negative samples on the split HTRU.

FAST is another pulsar candidate dataset, which consists of 1160 positive samples and 14319 negative samples. Fig. 4 respectively exhibit the sub-int plot of positive and negative sample, which are adopted for model training. The dimensions of sub-int is 64 × 64. Therefore, all images are normalized to [0, 1] and the dataset are divided into training set, validation set and test set. Table 1 also lists the number of positive and negative samples on the split FAST.

Fig. 4

Samples on FAST dataset.

Table 1

No. of samples on split HTRU FAST dataset

Split Dataset	HTRU			FAST
	No. of P	No. of N	Total No.	No. of P	No. of N	Total No.
Train set	480	10920	11400	697	8603	9300
Valid set	238	30539	30777	233	2858	3091
Test set	478	48537	49015	233	2858	3091

For unbalanced data sets, accuracy is no longer an convincing indicator for evaluating recognition performance of the model. Therefore, the evaluation metrics we adopt for the HTRU and FAST dataset are Precision, Recall and F-score. The binary classification confusion matrix is defined in Table 2.

Table 2

Binary classification confusion matrix

Outcomes	Negative Prediction	Positive Prediction
Ground N	True Negative	False Positive
Ground P	False Negative	True Positive

Then Precision, Recall and F-score are defined as: $Precision = \frac{TP}{TP + FP},$ (11) $Recall = \frac{TP}{TP + FN},$ (12) $F - score = \frac{2 \times Precision \times Recall}{Precision + Recall} .$ (13) where TP, FP, and TN are abbreviations for true positive, false positive, and true negative in Table 2, respectively.

4.2 Network architectures and hyper-parameters

A generator, a classifier, and two discriminators are included in ICAT model, and their network structures are slightly different according to datasets. Fig. 5 exhibits the structures on MNIST, and Fig. 6 is the structures on HTRU and FAST, where Conv, Deconv, MaxPool and Dense represent convolutional layer, transposed convolutional layer, maximum pooling layer and full connection layer respectively.

Fig. 5

Network structure of ICAT model on MNIST.

Fig. 6

Network structure of ICAT model on HTRU and FAST.

All experiments were implemented on Theano [32]. The mini-batch size was set to 100 and the ADAM [33] algorithm was employed for model optimization, where β₁, β₂ are 0.5 and 0.999 respectively. The learning rates on MNIST, HTRU and FAST were set to 0.001, 0.05 and 0.05 respectively, and the corresponding training epoch were set to 500, 300, 200. Latent variable z , which was adopted as the input of the generator, was initialized by a normal distribution, and the dimension of which was 100. The input label y of the generator was the same as the label of the training sample in each mini-batch. All the weights were initialized by a normal distribution with a mean value of zero and standard deviation of 0.002. See the code for detailed parameters: https://github.com/gzmtzly/ICAT.

4.3 Classification and generation on MNIST

The error rates on MNIST, averaged by 5 run, are listed in Table 3. In our comparison, the network structure of the CNN is the same as that of the classifier C in ICAT; 40PCA-SVM takes 40 principal components extracted by principal component analysis(PCA) of input samples as the features, and these features are adopted to train a support vector machine(SVM) classifier. As can be seen from Table 3, with the increase of the training samples, the recognition accuracy of the model are also improved accordingly. By contrast, CNN model behave better than PCA-SVM and PCA-KNN. However, compared with CNN, the error rate of the ICAT model is reduced by 0.16 %, 0.12 %, 0.09 %, 0.08 %, and 0.07 %, respectively. It is found that the magnitude of error rate decrease became small with the number of training samples.

Table 3
Test error rate(%) on limited MNIST data(averaged by 5 run)

Methods N=10000 N=20000 N=30000 N=40000 N=all

KNN, Euclidean(L2) 4.97 3.91 3.46 3.41 3.12

40PCA-KNN(L2) 4.01 2.99 2.78 2.60 2.54

RBF-SVM 2.82 2.15 1.85 1.59 1.46

40PCA-SVM 2.65 1.96 1.78 1.66 1.52

CNN 0.59 0.47 0.42 0.39 0.35

ICAT 0.43±0.012 0.35±0.021 0.33±0.014 0.31±0.019 0.28±0.019

Methods	N=10000	N=20000	N=30000	N=40000	N=all
KNN, Euclidean(L2)	4.97	3.91	3.46	3.41	3.12
40PCA-KNN(L2)	4.01	2.99	2.78	2.60	2.54
RBF-SVM	2.82	2.15	1.85	1.59	1.46
40PCA-SVM	2.65	1.96	1.78	1.66	1.52
CNN	0.59	0.47	0.42	0.39	0.35
ICAT	0.43±0.012	0.35±0.021	0.33±0.014	0.31±0.019	0.28±0.019

The basic principle of the presented ICAT model is to expand the diversity of training samples by generating samples. Therefore, the controllability of the generated samples is crucial. Fig. 4 exhibits the generated images on MNIST. Intuitively, the quality and controllability of the generated images are continuously improved with the increase of training samples. To further study the controllability of the generated samples, we conducted the following experiments: Firstly, A "perfect" CNN classifier(0.35% error rate) was trained on MNIST with all 60,000 samples; Secondly, CGAN, auxiliary classifier GANs(ACGAN) and ICAT models were trained with limited data, and 10,000 samples were conditionally generated(average:1,000 per category); Finally, classify the generated samples with the “perfect” classifier. Table 4 summarizes the quantitative results. Although the classifier is not completely perfect, it is sufficient to illustrate that the ICAT behave better controllability than the CGAN and ACGAN.

Table 4

Classify on generated samples

Methods	N=10000	N=20000	N=30000	N=40000
CGAN	228	133	105	93
ACGAN	235	137	103	94
ICAT	186	128	93	78

In summary, experiments on MNIST show that the generator and classifier in ICAT model promote each other. On the one hand, the samples produced by the generator increase the diversity of training samples so as to improve the classification performance of the classifier. On the other hand, the classification errors of the generated samples are propagated back to the generator, which improves the controllability of the generator.

4.4 Classification and generation on HTRU and FAST

Comparison of F-score between ICAT and other models is shown in Fig. 8. Intuitively, we can see that the recognition performance of ICAT model is the best, and the convergence speed is also relatively fast. Table 5 lists the performance metrics of the models on HTRU and FAST dataset, where CGAN+CNN represents that the CGAN model is trained on HTRU and FAST, and 10,000, 8,000 positive samples are generated respectively to rebalance the datasets before the CNN model is applied for classification.

Fig. 7

Generated samples on MNIST.

Fig. 8

Comparison of F-score between ICAT and other models.

Table 5

Evaluations of different models on HTRU and FAST(%)

Models	HTRU			FAST
	Recall	Precision	F-score	Recall	Precision	F-score
40PCA+SVM	57.53	59.91	58.70	24.46	28.16	34.44
CNN	83.05	91.89	87.25	67.38	90.75	77.34
CGAN+CNN	86.40	88.82	87.59	69.09	89.94	78.15
ICAT	89.33	89.14	89.24	75.97	86.76	81.01

Three conclusions can be drawn from the classification experiment on HTRU. Firstly, the F-score of the ICAT model is 89.24%, which is 1.99% higher than the CNN model. Meanwhile, the Recall of the ICAT model is the highest. Therefore, the false negative rate of the proposed model is the lowest. Secondly, the Precision of the CNN model is the highest and the Recall is only 83.05%, which indicates that the false negative rate of the CNN model is higher and the model bias is grave. Finally, the F-score of the CGAN+CNN method is 87.59 %, which is 0.34 % higher than that of the CNN model, which implicates that it is feasible to apply CGAN to supplement sample diversity.

Similar conclusions can be drawn from experiments on the FAST: Firstly, compared with the CNN model, the F-score of the ICAT model has increased by 3.67%, and the Recall is also the highest 75.97%. Secondly, the Precision of the CNN model is the highest 90.75%, and the Recall is 67.38%. Therefore, it also exhibits higher false negative rate and severe model migration. Finally, CGAN+CNN method also improves the recognition performance to some extent.

Fig. 9 exhibits the generated samples of ICAT model on HTRU and FAST, in which the first 50 are positive samples and the last 50 are negative samples. It can be seen that even on the unbalanced dataset, the samples generated by the ICAT model are also controllable.

Fig. 9

Generated samples on HTRU and FAST.

It can be concluded from Table 5: compared with the CGAN+CNN model, the ICAT not only has a simpler training method, but also has better recognition performance. Of course, in addition to image classification, it also performs better than CGAN and ACGAN models in terms of the controllability of generated samples, which has been confirmed by Table 4. These two sets of experiments further revel that the performance of generator and classifier in ICAT model is improved synchronously, which is completely consistent with our theoretical analysis. To sum up, For the unbalanced pulsar candidate dataset, the traditional recognition models show the problems of poor recognition performance and model bias, while the proposed ICAT model can alleviate the dilemma. Therefore, it is more suitable for pulsar candidate recognition.

5 Conclusions

In this study, we present a novel image classification model based on adversarial training and apply it for pulsar candidate identification. A generator, a classifier and two discriminators are included in the model. During the training, the generator and classifier supervise and cooperate with each other, and finally achieve the optimum. Theoretical analysis demonstrates that the model has a unique optimal solution, and when the model reaches equilibrium, it can not only predict labels for unseen data, but also generate controllable samples. Experimentally, we firstly verify the classification performance of ICAT model on MNIST and confirm that ICAT behave better controllability than CGAN and ACGAN. Then, the model is applied to pulsar candidate datasets HTRU and FAST. The results manifest that it significantly improves the recognition accuracy and reduces the false negative rate. Therefore, the ICAT model is more suitable for pulsar candidates identification. An interesting area of our future investigation is to train the model on frequency-vs-phase plot, DM curve, and other hand-crafted features to obtain a pulsar candidate recognition system.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant no. U1831131) and Cultivation Project for FAST Scientific Payoff and Research Achievement of CAMS-CAS.

References

Krizhevsky

, Sutskever

and Hinton

G.E.

, Imagenet classification with deep convolutional neural networks[C]//, Advances in Neural Information Processing Systems 2012, 1097–1105.

Zeiler

M.D.

and Fergus

, Visualizing and understanding convolutional networks[C]//, European Conference on Computer Vision. Springer, Cham 2014, 818–833.

Simonyan

and Zisserman

, Very deep convolutional networks for large-scale image recognition[J], arXiv preprint arXiv:1409.1556, (2014).

Chawla

N.V.

, Bowyer

K.W.

, Hall

L.O.

, et al., SMOTE: synthetic minority over-sampling technique[J], Journal of Artificial Intelligence Research 16(1) (2002), 321–357.

Goodfellow

, Pouget-Abadie

, Mirza

, et al., Generative adversarial nets[C]//, Advances in Neural Information Processing Systems 2014, 2672–2680.

Fiore

, De Santis

, Perla

, et al., Using generative adversarial networks for improving classification effectiveness in credit card fraud detection[J], Information Sciences 2017, 448–455.

Fanny

C.TW.

, Deep Learning for Imbalance Data Classification using Class Expert Generative Adversarial Network[J], Procedia Computer Science 135 (2018), 60–67.

Mullick

S.S.

, Datta

and Das

, Generative adversarial minority oversampling[C]//, Proceedings of the IEEE International Conference on Computer Vision 2019, 1695–1704.

Mirza

and Osindero

, Conditional generative adversarial nets[J], arXiv preprint arXiv:1411.1784, (2014).

10.

Odena

, Olah

and Shlens

, Conditional image synthesis with auxiliary classifier gans[C]//, Proceedings of the 34th International Conference on Machine Learning 2017, 2642–2651.

11.

Salimans

, Goodfellow

, Zaremba

, et al., Improved techniques for training gans[C]//, Advances in Neural Information Processing Systems 2016, 2234–2242.

12.

Chongxuan

L.I.

, Xu

, Zhu

, et al., Triple generative adversarial nets[C]//, Advances in Neural Information Processing Systems 2017, 4088–4098.

13.

Gan

, Chen

, Wang

, et al., Triangle generative adversarial networks[C]//, Advances in Neural Information Processing Systems 2017, 5247–5256.

14.

Douzas

and Bacao

, Effective data generation for imbalanced learning using conditional generative adversarial networks[J], Expert Systems with Applications 91 (2018), 464–471.

15.

Frid-Adar

, Diamant

, Klang

, et al., GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification[J], Neurocomputing 321 (2018), 321–331.

16.

Johnston

, Lyne

A.G.

, Manchester

R.N.

, et al., A highfrequency survey of the southern galactic plane for pulsars[J], Monthly Notices of the Royal Astronomical Society 255(3) (1992), 401–411.

17.

Lee

K.J.

, Stovall

, Jenet

F.A.

, et al., PEACE: pulsar evaluation algorithm for candidate extraction–a software package for post-analysis processing of pulsar survey candidates[J], Monthly Notices of the Royal Astronomical Society 433(1) (2013), 688–694.

18.

Faulkner

A.J.

, Stairs

I.H.

, Kramer

, et al., The Parkes Multibeam Pulsar Survey -V. Finding binary and millisecond pulsars[J], Monthly Notices of the Royal Astronomical Society 355(1) (2004), 147–158.

19.

Eatough

R.P.

, Molkenthin

, Kramer

, et al., Selection of radio pulsar candidates using artificial neural networks[J], Monthly Notices of the Royal Astronomical Society 407(4) (2010), 2443–2450.

20.

Bates

S.D.

, Bailes

, Barsdell

B.R.

, et al., The High Time Resolution Universe Pulsar Survey -VI: An artificial neural network and timing of 75 pulsars[J], Monthly Notices of the Royal Astronomical Society 427(2) (2012), 1052–1065.

21.

Morello

, Barr

E.D.

, Bailes

, et al., SPINN: a straightforward machine learning solution to the pulsar candidate selection problem[J], Monthly Notices of the Royal Astronomical Society 443(2) (2014), 1651–1662.

22.

Zhu

W.W.

, Berndsen

, Madsen

E.C.

, et al., Searching for pulsars using image pattern recognition[J], Physics 781(2) (2014), 109–125.

23.

Wang

H.F.

, Zhu

W.W.

, Guo

, et al., Pulsar candidate selection using ensemble networks for FAST drift-scan survey[J], Science China(Physics, Mechanics & Astronomy) 62(5) (2019), 65–74.

24.

, Zhang

, Ren

, et al., Deep residual learning for image recognition[C]//, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016, 770–778.

25.

Guo

, Duan

, Wang

, et al., Pulsar Candidate Identification with Artificial Intelligence Techniques[J], arXiv preprint arXiv: 1711.10339, (2017).

26.

Zhang

, Zhao

, An

, et al., Pulsar candidate recognition with deep learning[J], Computers & Electrical Engineering 2019, 1–8.

27.

, Wu

, Cox

D.D.

, et al., Conditional Infilling GANs for Data Augmentation in Mammogram Classification[J], arXiv preprint arXiv:1807.08093, (2018).

28.

Shin

H.C.

, Tenenholtz

N.A.

, Rogers

J.K.

, et al., Medical image synthesis for data augmentation and anonymization using generative adversarial networks[C]//, International Workshop on Simulation and Synthesis in Medical Imaging 2018, 1–11.

29.

Zakaryazad

and Duman

, A profit-driven artifificial neural network (ANN) with applications to fraud detection and direct marketing[J], Neurocomputing 175 (2016), 121–131.

30.

Wang

and Yao

, Using Class Imbalance Learning for Software Defect Prediction[J], IEEE Transactions on Reliability 62(2) (2013), 434–443.

31.

LeCun

, Bottou

, Bengio

, et al., Gradient-based learning applied to document recognition[J], Proceedings of the IEEE 86(11) (1998), 2278–2324.

32.

Al-Rfou

, Alain

, Almahairi

, et al., Theano: A Python framework for fast computation of mathematical expressions[J], arXiv preprint arXiv:1605.02688, (2016).

33.

Kingma

D.P.

and Ba

, Adam: A method for stochastic optimization[J], arXiv preprint arXiv:1412.6980, (2014).