Abstract
Transfer learning technique is popularly employed for a lot of medical image classification tasks. Here based on convolutional neural network (CNN) and sparse coding process, we present a new deep transfer learning architecture for false positive reduction in lymph node detection task. We first convert the linear combination of the deep transferred features to the pre-trained filter banks. Next, a new point-wise filter based CNN branch is introduced to automatically integrate transfer features for the false and positive image classification purpose. To lower the scale of the proposed architecture, we bring sparse coding process to the fixed transferred convolution filter banks. On this basis, a two-stage training strategy with grouped sparse connection is presented to train the model efficiently. The model validity is tested on lymph node dataset for false positive reduction and our approach indicates encouraging performances compared to prior approaches. Our method reaches sensitivities of 71% /85% at 3 FP/vol. and 82% /91% at 6 FP/vol. in abdomen and mediastinum respectively, which compare competitively to previous approaches.
Keywords
Introduction
Remarkable progress has been made in the fields such as cloud computing [9], data mining [49] and 5 G [50]. And significant development has also been achieved in computer vision analysis, thanks to the usability of wide-scale labeled databases and convolutional neural network (CNN). Therefore, many researchers use deep machine learning methods for medical image processing [5, 47] and security [52, 53]. Whereas, different with traditional algorithms relied on hand-designed features [31], training CNN from scratch is impracticable because of the lack of labeled data for many medical image processing tasks. Moreover, annotating medical images needs professional medical field knowledge, which is tiresome and expensive. The size of biomedical image datasets is small in contrast to natural image datasets, such as IMAGENET [21].
To address this problem, transfer learning (TL) method is triumphantly employed in deep learning domain for medical image processing [30] where a network pre-trained on a source dataset is partly beneficial for target domain. Concretely, transfer learning includes two strategies: feature extracting and fine tuning. The former directly views the “off-the-shelf” features [34] derived from a pre-trained deep neural network (DNN) as a feature vector. Then train a new model from scratch for the target task. While fine tuning technique takes advantage of the pre-trained DNN to initialize the training parameters of the target model, and fine-tuning it with a smaller learning rate [27]. The issue of how to exploit the “off-the-shelf” features more efficiently for target tasks brings about lots of feature selection methods [25], e.g. wrapper, embedded-based and filter approaches, etc. The fine-tuning method, due to the database bias between source and target domains, unfortunately needs a large number of labeled training samples on a new task, and many digital medical image processing scenarios are not available [45]. Therefore, plentiful novel methods have been introduced, like domain adaptation techniques [20, 26], ensemble based approach [10] and multitask learning [7].
In practice, multisource transfer learning method has been used for a lot of computer-aided diagnosis fields [10, 36]. Transferring knowledge from multiple source datasets to medical imaging analysis tasks could be more beneficial than barely from unrelated natural images. Moreover, the over-fitting problem of the target network can be eased by multisource transfer learning method based on the varying modality databases, which is very applicable for small-scale biomedical imaging databases. Nevertheless, the manner that acquiring the maximum benefit from the factory of the multisource transferred features becomes a tricky issue for the multisource transfer learning technique. While there are many feature fusion or selection techniques [25, 27] knowledge ensemble mechanisms [12] in computer-aided diagnosis field, they tend to the hand-craft methods. One is unavoidable to confront such a problem that which layers of the transferred DNNs are frozen or whether fixed or fine-tuning the pre-trained parameters for target task. It is usually addressed through an amount of well-designed experiments in practice. At the same time, the validity of transfer learning is also affected by the specific CNN frameworks (VGG, AlexNet etc.) trained on the source or target databases.
In this paper, we propose a new transfer learning method via CNN and sparse coding algorithm. We first convert the combination of the deep transferred features to the pre-trained filter banks. Next, a new point-wise convolution based CNN branch is introduced to automatically integrate transfer features for the false and positive image classification purpose. To lower the scale of the proposed architecture, we bring sparse coding process to the fixed transferred convolution filter banks. A two-stage training procedure with grouped sparse connection is also introduced to train the model efficiently. In other words, the proposed approach could be viewed as a “meta-learning” method, where our model can learn to learn from the fixed pre-trained parameters. The model validity is tested on lymph node (LN) database for false positive reduction and the proposed model indicates encouraging performances compared to prior approaches.
We give the primary contributions of this paper: 1) through 1×1 convolution, a new CNN based framework is introduced for multisource transfer learning. 2) We utilize sparse coding algorithm on the multisource transferred filters to decrease the model complexity and ease over-fitting phenomenon. 3) To improve our model feasibility, we devise a 2-phase training scheme via dropout and grouped sparse connection.
Rest of the part of the paper is organized as follows: first we briefly review the related literatures about deep transfer learning methods for medical image classification in Section 2. Section 3 illustrates the new designed framework. Section 4 introduces experiments and corresponding discussions on LN false positive reduction task. We give the conclusion in Section 5.
Related work
Convolutional neural network and transfer learning in medical image classification
CNN and transfer learning reveal superior performance in many medical classification tasks with different kinds of image modalities. In [44], three critical items, CNN frameworks, database characteristics and transfer learning, for computer-aided detection (CADe) are investigated for two clinical problems: thoraco-abdominal lymph node detection and interstitial lung disease classification. In [32], a two-layer neural network equipped with the features transferred from one pre-trained convolutional neural network is proposed to better classify the Human Epithelial-2 cell images. Beyond that, A small-scaled annotated dental image database is applied to three independent convolutional neural networks for diagnosis classification [33]. Reference [22] utilizes pre-trained Inception V3 model, and fine-tunes it on the annotated endoscopy images to detect the gastrointestinal bleeding. Relied on VGG-Net based transfer learning technique and a novel FCNet, the liver fibrosis in ultrasonic images are efficiently categorized [28]. In order to deal with a low quality and small-scale training database for skin lesion classification, the pre-trained AlexNet model is modified and fine-tuning process is also introduced to classify dermal images [16]. Moreover, multisource transfer learning technique is also utilized in [26] for lung pattern analysis. The literatures mentioned above demonstrate the effectiveness of CNN and transfer learning in medical image analysis field.
Parameter predicting in convolutional neural network
As above described, the point-wise convolution is used to acquire or predict the adaptive transfer knowledge (i.e. CNN filters in this paper) for target domain. Some researchers introduce several parameter predicting methods in deep learning manner. In [15], considering the parameterization redundancy existed in convolutional neural network, the authors employ a fraction of weight values (5%) to precisely predict the remaining parameters for each pre-trained convolution kernel. Based on the smoothness characteristic of the pre-trained weights, kernel-based method is used to obtain the factorized weight matrices to form a pre-trained parameter space. In addition, a learnet model is introduced in [4] to acquire deep model weights for one shot learning. Different with the previous method, it can non-linearly predicts the entire weight values for each layer by means of a given exemplar image. Lately, a lookup-based convolutional neural network is introduced in [3]. It encodes convolutions by several lookups to a dictionary which can be adaptively learned to structure the weight space of convolutional neural networks. Through training parameter dictionary and a small set of linear combinations jointly, the approach in [3] is able to accelerate the processes of model training and inference obviously.
Methods
The point-wise convolution based transfer learning
As previously described, we use the point-wise convolutional layer to straightly get the linear combination of transferred filters to generalize the model for the target domain. Figure 1 shows the whole framework.

The whole framework of the proposed method (a) schematic diagram (b) point-wise convolutional layer based structure.
Figure 1(a) gives a plain schematic of the proposed approach and Fig. 1(b) shows the concrete view in convolutional (Conv) layer configuration. The transferred feature layer is consisting of the fixed pre-trained convolutional kernels of a certain deep model for the source databases.
Then we utilize the point-wise convolutional operation to linearly combine the multisource transferred features. Moreover, we also introduce the activation functions for point-wise convolutional layer, which is to increase the nonlinearity to the transferred features. We stack Fig. 1(b) structure layer by layer and obtain the fundamental framework of the proposed approach.
Unfortunately, directly performing this primary configuration is impracticable, and we must further improve its efficiency. Figure 2 demonstrates the elaborate structure corresponding to Fig. 1(b), where a certain input and output channels are considered respectively.

The convolution process of transferred feature layer and 1×1 Conv layer for one single input and output channels.
Concretely, provided there are n input pre-trained convolutional filters
Based on Equation (1), we can deduce that the bottleneck of above framework is the undesirable numerous pre-trained filters, i.e.
To solve this issue, firstly the mean value of the output feature map, presented in Equation (1), can be estimated based on the approximate Equation (2) which is first introduced in [19]:
In Equation (3), one can certainly put the w
i
into g (·) to turn into
In Equation (3), we approximately transform our raw point-wise convolutional layer based transfer learning structure, illustrated in Fig. 1(b), to a traditional convolution layer, excepting that its totally trainable convolutional kernels are replaced by a trainable linear combination of the fixed pre-trained filters. Based on this assumption, the model complexity and computation burden can be largely reduced by decreasing the intrinsic redundancy existing in the pre-trained filter space. In the next section, we will use the sparse coding algorithm to achieve this purpose.
The former literatures [3, 15] reveal that a fair amount of redundant information exists in the parameter space of some deep learning based networks. Based on this fact, we can improve the feasibility of our model. At the same time, researchers have proposed many related approaches to cope with this redundancy, for example, kernel-based dictionary [15], autoencoder [1], factorized parameter matrix based model [4] and structured sparsity learning [13, 48].
Here, the sparse coding algorithm [37] on transferred filters is utilized to better cooperate with the 1×1 Conv layer. Specifically, we first extract the over-complete basis (i.e.
The objective function of sparse coding algorithm is shown below
Rely on the over-complete basis extracted from transferred filters, the complexity of the 1×1 Conv layer is reduced dramatically. The total number of convolution operation in section 3.1 decreases to mc + mck where c is the number of over-complete basis. However, notice the sparsity among the weights on the basis (i.e. s in Equation (4) or
Specifically, suppose every shared over-complete basis in l - th layer corresponds to a “meta-feature” for a certain input channel. In the beginning, we don’t know which one is necessary or redundant for a specific output feature channel. Similar to [3, 15], the convolution can be executed before the linear combination of over-complete basis for efficiency. The corresponding formula is given as
To this end, based on Dropout [19] and DropConnect [46] techniques, we introduce a new grouped sparse connection for the point-wise convolutional layer. Notice that the point-wise convolutional layer can also be viewed as fully connected (FC) layer with shared parameters for each spatial location in feature map domain. Figure 3 is a simple schematic comparison between Dropout, DropConnect and our grouped sparse connection network. Figure 3(a) depicts that Dropout network simply sets the output propagating to the next layer to 0 at random. DropConnect randomly lets the weights of neurons be 0 at current layer. Whereas our grouped sparse connection network can be viewed as a compromise version, which zero weights are randomly inter-grouped. In other words, the number of random zero weights connected with the same neuron in the next layer is fixed. For example, each group in Fig. 3(c) (denoted as Vi·) corresponds to a certain input channel and each V ij represents its “meta-feature”. It means that we set the same sparse connection rate for all output channels. For clarity, the zero-masks of the three networks are shown in Fig. 4, where the white and grey blocks denote zero or non-zero elements respectively.

The comparison between Dropout, DropConnect and our grouped sparse connection (the dotted lines represent zero outputs in (a) or zero weights in (b) and (c)).

The zero-masks of Dropout, DropConnect and our grouped sparse connection network (two groups with spare connection rate, 0.6 in (c)).
Next, to implement our sparse connection network in practice, we adopt a 2-phase training mode. In the first phase, zero-masks are generated randomly for every mini-batch. Therefore, the over-complete bases are randomly selected to assemble the corresponding transferred features. That is, we maintain a parameter list for every 1×1 Conv layer to update the corresponding parameters in back propagation process. After certain iterations, we put a hard sparse constraint to the parameters in the second training phase. The first p maximum values in parameter lists are located and their corresponding masks are set to one for every input channel in 1×1 Conv layer. Then in the following iterations, the masks are frozen. We argue that the first p maximum parameter values correspond to the p most important features for a given input channel, and the others should be forced to zero to keep the learned features sparsity. Based on the proposed 2-phase training strategy, our network becomes more feasible and efficient. As for the related hyper-parameters, they are determined in section 4.
At last, similar with [19], the validation results in the first training phase can be acquired using the following equation:
Compared with the primitive Conv layer module in Fig. 1, the detail of the new improved CNN architecture in one layer is illustrated in Fig. 5.

The architecture of proposed 1×1 Conv based transfer learning network.
We display the effectiveness of the proposed method in the application of false positive reduction task for lymph node. A concrete CNN model is designed based on the practical CT LN dataset scale (595 abdominal LNs from 86 patients and 388 mediastinal LNs from 90 patients). Specifically, we design the network architecture with five convolutional layers, two fully connected layers, and a binary classification softmax layer. Because the ROI is relatively small within every CT scan domain, the 3×3 convolution filters are used to extract the corresponding local spatial features [2] and pooling layers are abandoned excluding the last global average pooling layer [23]. Stacking the proposed 1×1 Conv based network block shown in Fig. 5 layer by layer, the overall structure of our CNN model is given by Fig. 6, where the point-wise convolutional layers contain the whole learnable weights of the network.

The overall architecture of the proposed CNN model.
In this section, we implement our model on the publicly available lymph node CT datasets for false positive reduction task [40, 42]. The datasets contain 90 patients with 389 mediastinal LNs, and 86 patients with 595 abdominal LNs. In section 4.1, we firstly present the multisource and target databases used in this paper. Following that is the experimental protocol in section 4.2. The strategy of hyper-parameter selection and the corresponding discussion are displayed in section 4.3. Lastly, section 4.4 gives the comparison between the proposed model and other representative methods.
The source and target databases
In this paper, we employ multiple source datasets to make the filter bank dictionary in sparse coding algorithm more generalized and adaptable for the target task. Here we use six source datasets, including one natural images database (IMAGENET [21]), four texture databases (ALOT [6], DTD [11], KTH-TIPS [14] and Flickr Material Dataset [43]) and one medical image dataset (Diabetic Retinopathy [5]). Considering training our model on the source databases separately is inefficient even impractical, we instead directly borrow the transferred filters from the “off-the-shelf” pre-trained VGG-M model [8] which is introduced by the former literatures [27, 35].
Next, we present the CT lymph node databases for target task. Firstly, clinicians labeled 388 mediastinal lymph nodes as positive class in CT images of 90 cases and 595 abdominal lymph nodes in 86 cases. Secondly, we utilize the negative lymph nodes candidate sets detected by two previous CADe algorithms [40] as the false positive samples for mediastinal and abdominal lymph nodes in this paper. Through the two detection algorithms, 94% -97% sensitivity at 25-35 FP/vol can be achieved. The whole LN databases are listed in Table 1.
Lymph node detection datasets
Lymph node detection datasets
With this CT lymph node dataset, an image classification task is implemented for false positive reduction task, where the true LNs are labeled as positives, the negatives are negatives.
According to [40], All false-positive samples (>15 mm away from true LN locations), i.e. 3208 false-positive detections in the mediastinum and 3484 in the abdomen, are utilized as negative LN candidate data for training the model. All patients are randomly divided into five subsets at the patient level to execute a 5-folded cross-validation. We let the ratio of positive and negative training patches to be 1, i.e. a balanced training set is guaranteed. Here, we simply utilize single input channel for gray image patches, which can be readily generalized to multiple. All 32×32 pixels image patches are centered at the LN candidate coordinates. In addition, a training data augmentation method is used. Every training image patch is rotated at there degrees: 90, 180, 270 to corresponding 3 consecutive axial slices (up to 2 mm). Hence, we obtain the final training samples of which number is 12 times larger than the original. The soft-tissue window level is [–100, 200 HU] [40].
In practice, according to the fundamental architecture depicted in Fig. 6, we design five models with different scales, i.e. [8, 16, 16, 32, 32, 64, 128], [16, 16, 32, 32, 64, 128, 256], [32, 32, 64, 64, 128, 256, 512], [32, 64, 64, 128, 256, 256, 512], [32, 64, 128, 256, 512, 512, 1024], where each figure denotes the channel or node number in convolutional or fully connected layers successively. The classification results of each model are illustrated in Section 4.2. Model performance is evaluated by Free-response Receiver Operating Characteristic (FROC) and the area-under-the-FROC-curve (AUC). In model reference phase, similar with [40], we directly average the probability values {P1 (x) , ⋯ , P5 (x)} generated from the above five model configurations to gain the integral classification result for one input patch x using the following equation.
We use Theano and MATLAB to deploy our model and execute the sparse coding algorithm respectively. The framework is implemented on Nvidia Tesla K40 m GPU. The model is trained by minimizing the classification cross entropy loss function with Stochastic Gradient Descent. The mini-batch size is 50. Initial learning rate is 0.01, reduced by a factor of 10 after every 25 epochs, and we train the model 200 epochs. Here ReLU is employed as the activation function and the trainable parameters are initialized by msra [18]. And about 8-10 hours are taken to train one network.
In order to optimally determine the hyper-parameters of the proposed model, the network configuration, [32, 32, 64, 64, 128, 256, 512] is utilized as a baseline model to implement hyper-parameter selection procedure for mediastinal LN detection task. Then successively apply the selected optimal hyper-parameters to the other four networks.
The primary hyper-parameters contain the number of over-complete basis c, the spare rate p, the number of iterations N of the first training phase and the number of source domain datasets etc.
Firstly, the model performance with different values of the spare rate p is presented. To this end, the other hyper-parameters, i.e. c = 18, N = 5000, are fixed and successively set p∈ { 1, 0.7, 0.5, 0.3, 0.15 }. Based on 5-folded cross-validation, the average FROCs of mediastinal lymph nodes classification task are illustrated in Fig. 7.

The classification FROCs with different values of spare rate p.
In Fig. 7, the behavior of the spare rate p is like the dropout ratio in Dropout technique. When p ≈ 0.5, the variety of random combinations of neurons is maximum and the corresponding model can generalize well to new target domain. Whereas, we argue that the optimal p is also influenced by the number of over-complete basis c. When increase c, the feature sparsity, i.e. the number of zero parameters, will be degraded and lead to more trainable parameters. In the next experiment, however, we will demonstrate that the range of c is suggested to be small. Hence, we could set the p ≈ 0.5 in the following experiments.
In the following, we set different values of c, the number of over-complete basis. The corresponding FROCs are shown in Fig. 8. The four model performances are visually close, which indicates that the sufficient transferred features could be extracted with c = 16 for the target task here and increasing its value will only bring more complexity and computation to our method. This conclusion can also be applied to the number of iterations N for the first training phase of which FROCs are not shown here for brevity. According to the experimental results with different values of N, the optimal performance can be obtained as N approximates to 5000 (i.e. about 2500 times update for each trainable variable). And increasing N will not bring more optimal results to our model. So, we set c = 16, p = 0.5, N = 5000 in the remaining experiments.

The FROCs with different numbers of over-complete basis c.
Next, to verify the necessity and effectiveness of introducing multisource transfer datasets to our model, we investigate four scenarios: transfer learning based on the aforementioned six source datasets, texture images, natural images (ImageNet), and random transferred features (Gaussian random dictionaries) respectively. The model performances in Fig. 9 demonstrate that the network which is equipped with the transferred knowledge derived from the six source datasets gives the best classification result, as expected. After that is the network transferred from the texture image domain. The remaining scenarios are the two networks with random dictionaries and transferred features pre-trained on ImageNet successively. It is worth mentioning that the network transferred from ImageNet dataset performs the worst. It is contrary to the observation in [27] where the transferred knowledge from the large-but-unrelated dataset provides greater benefits to target task. We argue that one of the most probable reasons is the difference in transfer learning methods. Our model is equipped with the transferred filters layer by layer, which is similar to the approach used in [51]. The deeper layer features pre-trained from the uncorrelated source datasets are more difficult to generalize to the target task than from the source domains with less dataset biases (e.g. texture image dataset here). In practice, this is also verified by the reality that encouraging performances have been obtained by transferring knowledge from the texture image databases for biomedical image processing [10, 36]. Moreover, compared with transferring from the natural images, the Gaussian random dictionaries show better classification result, possibly because we normalize the dictionaries to [-1, 1]. In general, random convolutional kernel can be viewed as an edge detector in shallow layers. Whereas for deep level filters, we infer that the point-wise convolutional parameters after the randomly initialized dictionaries are much easier to train than the dictionaries transferred from natural image dataset with dataset biases. It seemingly coincides with the negative transfer phenomenon [38] where the model that utilizes transferred knowledge performs worse than the original one. In addition, we list the classification results, (i.e. AUC, accuracy (Acc.), sensitivity (Sen.) and specificity (Spe.) along with the corresponding 95% confidence interval (CI)) of the 5-folded cross-validation with different transfer source datasets in Table 2. The AUC p-value of Student’s t test between the six source domains and texture images is 0.046, which is lower than the significance level α = 0.05. It further demonstrates that the new introduced six source domain transfer knowledge can indeed improve the model classification performance.
The five-fold classification results of the proposed model with different transfer source datasets

The FROCs with different source datasets.
In conclusion, through above selected optimal hyper-parameters, five networks with different configurations are implemented separately for false positive LN reduction task. The corresponding FROCs and AUCs of mediastinal and abdominal lymph node classification are illustrated in Fig. 10(a) and (b) respectively. Note that due to the 5-folded cross-validation strategy at patient level, the range of false positives per patient in the average FROC curves for mediastinal and abdominal lymph nodes is given in 0–30. In Fig. 10(a) and (b), the sensitivities are about 75% /85% at 3 FP/vol. and 86% /91% at 6 FP/vol. in abdomen and mediastinum respectively, which compare competitively to previous approaches. The detailed comparison is presented in the next section.

The FROCs of the five models with different configurations, (a) mediastinal lymph node classification (b) abdominal lymph node classification.
In the model inference phase, we present several abdominal LN patches in the test set in Fig. 11, which are correctly classified by our model with averaged confidence scores p > 0.95 derived from Equation (6). The top row corresponds to true positive abdominal LNs, and the bottom cases are the true negatives. Intuitively, our model can effectively recognize the prominent samples with higher confidence scores.

The correctly classified abdominal lymph nodes with higher confidence. Top row: positives; bottom row: negatives.
We illustrate four subsets of trained convolutional 3×3 filters randomly extracted from the second Conv layer of model 3 in Fig. 12. The top two rows are trained for mediastinal lymph node classification and the bottom two are for abdomen lymph node classification. Visually, the various appearances indicate the generalization ability of the proposed transfer learning mechanism for different target datasets.

Four trained filter subsets randomly extracted from the second convolutional layer of model 3.
In this section, relied on the competitive classification results depicted in Section 4.3, we compare the proposed approach with the others. The classification results are straightly cited from corresponding work. The results comparison is given in Table 3 where AUC and TPR/3FP are utilized as performance and the figures denoted in bold correspond the best results. The networks, CifarNet, AlexNet-*, and GoogLeNet-* are borrowed from [44], and the “*-H” denotes high resolution input with 256×256 pixels, “-L” represents low resolution input (64×64), “*-RI” is training from scratch, “*-TL” is transfer learning from ImageNet.
Comparison of our method with other previous work on lymph node detection.
Comparison of our method with other previous work on lymph node detection.
Due to different methods have different training and dataset preparation processes respectively, it is hard to simply compare our model performance with others. We instead discuss the mechanisms that our model is different with the others. Firstly, the approaches in [17, 42] are part of traditional machine learning methods where only low-level features are employed. Secondly, among the deep learning based approaches, the CifarNet is underperforming, because of the large model scale with low image sizes. And the GoogLeNet-RI-H is not immune from over-fitting phenomenon. In other words, lymph node training images are not sufficient to train the complex network with 22 layers. It is worth mentioning that the approaches that are equipped with transfer learning mechanism (denoted as “*-TL”) generally show better results than those training from scratch (denoted as “*-RI”).
Compared with random initialization, transfer learning indeed provides a more beneficial proposal for initializing the CNN parameters. Meanwhile, relied on the new introduced multisource transfer learning scheme, encouraging performances can also be acquired by our method on mediastinum lymph nodes false-positive reduction problem. For abdomen lymph nodes, a novel 2.5D method with a new data augmentation strategy is presented in [40], which gives the best classification results for abdomen lymph nodes. Literature [44] points out that the ideal classification results shown in [39] can be attributed to a specific dataset preparation protocol. That is a more enterprising extraction of random views in a large range of fields-of-view to obtain additional spatial contextual information. Admittedly, the strategy depicted in [39, 40] paves one of the future ways to improve the proposed framework performance.
In this paper, we introduce a new multisource transfer learning based framework for CT lymph node false positive reduction via sparse coding algorithm. Deep transfer learning with CNN framework reveals enough practicability for the target task in here. More importantly, the 1×1 convolutional layer can be utilized not only to combine features from different channels as many former literatures verified, but also to successfully transfer knowledge from multisource domains. In the future, we continue to further investigate the intrinsic characteristics of transferred features to improve the generalizability and efficiency of the proposed model for more medical image processing problems.
Footnotes
Acknowledgments
This work is supported by the National Natural Science Foundation of China under Grant No. 61976126 and the Natural Science Foundation of Shandong Province under Grant No. ZR2019MF003.
