Enhancing classification performance through multi-source online transfer learning algorithm with oversampling

Abstract

Multi-source online transfer learning uses the tagged data from multiple source domains to enhance the classification performance of the target domain. For unbalanced data sets, a multi-source online transfer learning algorithm that can oversample in the feature spaces of the source domain and the target domain is proposed. The algorithm consists of two parts: oversampling multiple source domains and oversampling online target domains. In the oversampling phase of the source domain, oversampling is performed in the feature space of the support vector machine (SVM) to generate minority samples. New samples are obtained by amplifying the original Gram matrix through neighborhood information in the source domain feature space. In the oversampling phase of the online target domain, minority samples from the current batch search for k-nearest neighbors in the feature space from multiple batches that have already arrived, and use the generated new samples and the original samples in the current batch to train the target domain function together. The samples from the source domain and the target domain are mapped to the same feature space through the kernel function for oversampling, and the corresponding decision function is trained using the data from the source domain and the target domain with relatively balanced class distribution, so as to improve the overall performance of the algorithm. Comprehensive experiments were conducted on four real datasets, and compared to other baseline algorithms on the Office Home dataset, the accuracy improved by 0.0311 and the G-mean value improved by 0.0702.

Keywords

Multi-source transfer learning online learning imbalanced data support vector machine (SVM)

1 Introduction

In the field of machine learning, transfer learning is an important technology, it has been extensively studied [1, 2]. Many models in applications are constructed based on a large amount of training data, but collecting and labeling sufficient data can be difficult and expensive [3,4]. The main purpose of transfer learning is to use the useful information extracted from one or more source domains to improve the learning performance of the target domain. A typical example is that it is difficult to collect enough tiger data, but cat data is abundant. Transfer learning can be used to build a tiger classification model using cat data. Therefore, the obvious advantage of transfer learning is to use the useful knowledge in the source domain to improve the overall function prediction performance and reduce the expensive number of markings needed. Therefore, transfer learning has been applied to various fields. The application of different transfer learning methods is described in the consciousness recognition task of human-computer dialogue systems [5]. Tthe application of transfer learning is introduced in cross domain recommendation algorithms [6].

At the start of research into transfer learning, knowledge was only transferred from a single source domain to the target domain [7, 8]. However, in certain practical applications, knowledge learned from multiple source domains can be easily transferred to the target domain [9]. For instance, in the case of document classification in five languages, to classify documents in English, one can learn from documents translated from French, German, Spanish, and Italian to English. Each translated document can serve as a source domain [10]. However, different source domains have varying contributions to the target domain. To overcome this limitation, a boosting-based method can be utilized to design more complex multi-source transfer learning algorithms [11].

Most multi-source transfer learning is conducted in an offline environment [12–14]. In some practical applications, the training data of the target domain is not provided in advance, but received sequentially in the process of learning the function of the target domain; this is called online transfer learning [15]. In the era of big data, online learning can handle a large and rapidly growing amount of data tasks that traditional batch processing algorithms cannot handle. In online learning, the objective domain function receives one sample and its corresponding label in each round; it then uses the objective function to predict the current sample and obtain the prediction result. It then updates the objective function based on the loss information between the actual labels of the current sample and the predicted results. Online learning is applied to large-scale planning in model service computing [16], which improves the time efficiency of prediction while also meeting the real-time requirements of computation. For multi-source online transfer learning, the final prediction result of each round of arriving samples is obtained by combining the prediction results of multiple source classifiers and target classifiers.

Imbalanced dataset refers to a dataset where the number of examples in each class is not equal. This can also pose a challenge for machine learning algorithms, as they may tend to predict the majority class more often and have poor performance on the minority class. Most transfer learning algorithms do not address unbalanced datasets, assuming that the distribution of data categories is balanced. However, unbalanced data is common in many real-world classification problems. Traditional classifiers assume equal misclassification costs for different categories, which can result in poor performance when dealing with imbalanced datasets. Misclassifying a minority class sample as a majority class sample can be extremely costly. Previous studies have proposed various methods to handle imbalanced datasets, which can be classified into data-based sampling methods, cost-sensitive methods, and algorithmic level methods [17]. Data-driven sampling methods balance the dataset before training the classifier. Cost-sensitive methods impose higher penalties on decision functions for misclassifying minority class samples. Algorithmic approaches modify classifiers such as support vector machines to address class imbalance [18, 19].

Multi-source domain refers to the situation where data is collected from multiple sources or domains, which may have different distributions, features, or labels. This can pose a challenge for machine learning algorithms that are trained on one domain and tested on another, as the model may not generalize well to new data. In multi-source online transfer learning, the target domain extracts useful knowledge from multiple source domains to help with target function classification. An online transfer learning algorithm is proposed, it could use multiple source domains related to the target domain [20]. A multi-class classification algorithm is proposed based on multi-source online transfer learning, which classifies multiple classes through a two-stage integration strategy [21]. A multi-source online transfer learning method is proposed [22]. In the process of online training, a small number of samples in the target domain were amplified to improve the overall classification performance. However, in the real world, the data in most classification tasks usually has an imbalanced category distribution. Unbalanced data classification is an important research topic in the field of machine learning and is also important in multi-source online transfer learning. In multi-source online transfer learning, the data categories of the source and target domains may be unbalanced. When the target domain data is imbalanced, the prediction results of the target domain function tend to lean towards the majority class; when the source domain data is imbalanced, the results of combining multiple source and target classifiers are highly likely to lean towards the majority class; when the data in both the source and target domains is unbalanced, more complex situations can arise. Obviously, multi-source online transfer learning for unbalanced data sets is an important and challenging topic that deserves extensive research. A multi-source information fusion method based on information sets (MSIF) is proposed [23]. The main goal of most information quality (IQ)-based measures is to combine data provided by multiple information sources to enhance the quality of information essential for decision makers to perform their tasks. However, there is few work to fuse multi-source information from the perspective of possibility distribution (PD) and use IQ as the evaluation criteria for feature selection. A novel representation model of possibility distributions is proposed based on fuzzy multisets [24]. Dempster-Shafer theory (DST), as a generalization of Bayesian probability theory, is a useful technique for achieving multi-source information fusion under uncertain environments. Nevertheless, when a high degree of conflict exists between pieces of evidence, unreasonable results are often generated using Dempster’s combination rule. How to fuse highly conflicting information is still an open problem. An improved belief Hellinger divergence measure is proposed [25], which can fully consider the uncertainty in basic probability assignments, to quantify the conflict level between evidence.

In this paper, a multi-source online migration learning algorithm called OTLMS_STO (multi-source online migration learning based on oversampling in source and target domain feature space) is proposed. This algorithm mainly studies the binary classification problem of unbalanced data. The existing method dynamically combines the source domain and target domain functions through the weight vector in the process of online learning, but does not consider the unbalanced distribution of source domain data and target domain data categories at the same time, and the OTLMS_STO algorithm proposed in this paper The minority class samples are oversampled in the feature space of the source domain and the target domain respectively, the source function is trained using balanced data, and the target function is improved in the process of online prediction, which effectively solves the problem of unbalanced category distribution. In the oversampling phase of the source domain, each source domain uses SVM as a classifier, synthesizes minority samples in the feature space of the source domain, and trains the SVM classifiers of each source domain by balancing the Gram matrix generated by the data. SVM is chosed as their model of choice because it has been shown to perform well on unbalanced datasets. SVM is known for its ability to handle high-dimensional data and to find non-linear decision boundaries. Additionally, SVM has a regularization parameter that helps to prevent overfitting, which is important when working with unbalanced data. If another model was used instead of SVM, the results may be affected depending on the characteristics of the dataset and the performance of the alternative model. For example, decision trees and random forests are also commonly used for unbalanced datasets, but they may not perform as well as SVM when dealing with high-dimensional data. On the other hand, deep learning models such as neural networks and convolutional neural networks may outperform SVM in certain cases, but they require more data and processing power to train effectively. Ultimately, the choice of model depends on the specific characteristics of the dataset and the goals of the analysis.

In the online oversampling phase of the target domain, passive aggressive (PA) algorithm is used to build the decision function of the target domain [26]. The target domain reaches a batch of data in each round, and searches for k-nearest neighbors from the minority samples that have already reached the batch in the previous round. Then, a few new classes of samples are synthesized on the line segment between the seed and neighboring sample pairs, and the generated new samples and the original samples in the current batch are used to train the decision function of the target domain. Finally, the improved source and objective functions are combined through weight vectors. The samples generated in the source and target domains are linearly separable, which can overcome the limitations of SMOTE (synthetic minority oversampling technique) method for nonlinear problems in the process of oversampling [27]. Experiments on several text and image datasets show that the proposed algorithm has better performance than the baseline algorithm of online transfer learning.

2 Materials and methods

2.1 Multi-source online transfer learning

In this chapter, we mainly introduce the multi-source online transfer learning algorithm HomeOTLMS [20]. HomeOTLMS combines classifiers built on multiple source and target domains to achieve effective ensemble segmentation Classer. By utilizing useful information from multiple source domains, the problem of insufficient sample data in the target domain is solved, ultimately improving the performance of the target domain.

HomOTLMS first constructs their decision functions in the offline batch learning paradigm based on the pre provided training data of m source domains (g^s1 (x) , g^s2 (x) … , g^{s
_m} (x)). For the target domain, an online passive attack algorithm is used to construct a decision that is updated onlineFunction g^t (x). The target domain receives one sample per round. In the i-th round, the target domain receives instances (x_i, y_i) , , and then uses a function to predict the given instance x_i, and calculates the hinge loss of the target domain decision function based on the real label y_i: $L_{i} max (0, 1 - y_{i} g^{T} (x_{i}))$ (1)

If the decision function suffers a non-zero loss on instance xj, then it is added as a support vector to the support vector set to update the decision function in the target domain: $g_{i + 1}^{T} (x) = g_{i}^{T} (x) + T_{i} y_{i} k (x_{i}, \cdot)$ (2)

Where T_i = min{ c, L_i/k (x_i, x_i) }, k (· , ·) is a kernel function. The kernel function plays a crucial role in the performance of many machine learning algorithms, particularly in kernel methods. It is a function that measures the similarity between two data points in the feature space. By computing the dot product between the transformed feature vectors, the kernel function implicitly maps the data points to a higher-dimensional space, allowing for the separation of non-linearly separable data. The choice of kernel function depends on the problem domain and the characteristics of the data. Some commonly used kernel functions include linear, polynomial, radial basis function (RBF), and sigmoid. Linear kernel functions are commonly used for linearly separable data, while polynomial kernels are used for data that are not linearly separable in the original feature space. RBF kernels are used for non-linearly separable data and are effective in capturing complex patterns in the data. Sigmoid kernels are used in neural networks and logistic regression.

After obtaining decision functions for multiple source and target domains, HomOTLMS combines them with weights to achieve effective integrated decision functions. By constructing a weight vector $v_{i} = (v_{i}^{1}, v_{i}^{2}, . . . ., v_{i}^{m})$ and a weight variable w_i representing the decision functions of the source and target, respectively, all weights are initialized to 1/(m + 1). In the i-th round of the target learning task, all classifiers make predictions for instance xi. If the classifier makes incorrect predictions, its weight will be reduced. For the weight of the source decision function $v_{i + 1}^{m} = v_{i}^{m} α$ . For the weight of the objective decision function w_i+1 = w_iα. Among them α ∈ (0,1) is a weighted discount parameter.

Normalize the weights before each decision function prediction to ensure that the sum of the weights in front of all decision functions is 1: $P_{i}^{j} = \frac{v_{i}^{j}}{\sum_{j = 1}^{m} v_{i}^{j} + w_{i}}, q_{i} = \frac{w_{i}}{\sum_{j = 1}^{m} v_{i}^{j} + w_{i}}$ (3)

When the i-th sample arrives, $p_{j}^{i}$ and q_i represent the weights of the j-th source decision function and the objective decision function, respectively. Then, the final prediction result is obtained based on the integrated decision function (4). $f (x) = sign (\sum_{j = 1}^{m} p_{i}^{j} g^{s_{j}} (x_{i}) + q_{i} g_{i}^{T} (x_{i}))$ (4)

HomOTLMS trains the target domain decision function through each round of target domain samples, and simultaneously adjusts the weight of each classifier to update the final integrated decision function, so as to carry out effective multi-source online transfer learning. However, the HomeOTLMS algorithm cannot effectively cope with the uneven distribution of data categories in the source or target domains. The following introduces a new multi-source online transfer learning method, which can reduce the overall classification error by manually balancing the distribution of source domain and target domain categories.

2.2 Problem description

This section formally introduces the problem of uneven distribution of data categories in multi-source online transfer learning. For a given m source domains, D^s ={ D^{s
₁}, D^{s
₂} . . . . D^{s
_m} } is used to represent them, and D^T is used to represent the target domains. Using X_{S
_j} × Y_{S
_j} represents the data space of the j-th source domain D^{S
_j} where the feature space of the source domain is X_{S
_j} = R^{d
_j}. For the target domain, its data space uses X×Y represents, where the feature space is X = R^d. And here, the source and target domains share the same label space y_{s
_j} = y ={ + 1, - 1 } and also share the same feature space, that is, for ∀_j = 1, 2, 3, m, R^dj = R^d.

Unlike HomeOTLMS, the proposed algorithm is mainly applied to the problem of the target domain arriving at a batch of data online each time. For the target domain ${(x_{i}, y_{i})}_{i = 1}^{L_{i}} \in X \times Y$ . The data for the t-th batch is ${(x_{i}, y_{i})}_{i = 1}^{L_{i}}$ . When a batch of samples arrives, the decision function of the target domain sequentially predicts each sample and updates itself, while the m source domains directly predict the samples of this batch to obtain m sets of prediction results. Finally, traverse the prediction results of m source and target domains to adjust the weights of the integrated decision function and obtain the final prediction result of the current batch.

The source domain uses SVM to train the classifier, and the target domain uses online passive attack algorithm (PA) to train the classifier. Both the source domain and the target domain use training in the feature space to obtain an optimal separation hyperplane to predict samples. When the categories are unbalanced, the hyperplane may be more sensitive to most class samples, and the prediction results are biased towards most classes. For the data in both the source and target domains, their category distribution may be uneven. Assuming that the sample with category +1 is a minority class, and the sample with category – 1 is a majority class. Using imbalanced source domain data to train multiple source classifiers, knowledge transferred from the target domain to the source domain may bias towards the majority class, which can have a negative impact on the data in the target domain. If the target domain data itself is imbalanced, there is a great possibility of skewing the target decision function towards the majority class, thereby affecting the final outcome of the integrated decision function. More complex situations often arise when the data in the source and target domains are unbalanced. The proposed OTLMS_STO algorithm in this article improves the overall classification performance of the integrated decision function by oversampling in the sample feature space of the source domain and the target domain, and better realizes knowledge transfer.

2.3 Oversampling in the feature space of the source domain

Proposed OTLMS_STO algorithm first oversampling in the feature space of the source domain, and improves the classifier of the source domain by using the balanced data set after sampling. Using basic classifiers such as SVM in multiple source domains, SVM classifies samples by identifying separation hyperplane in high-dimensional implicit feature space. For imbalanced datasets, SMOTE is an excellent sampling method that utilizes domain information to comprehensively generate minority sample points. It generates new samples on the line segment between two adjacent samples. However, for high-dimensional text and image data, SMOTE is limited to the problem of nonlinear separability.

Due to the fact that SVM classifiers from multiple source domains operate in the feature space, composite samples can be generated in the same feature space to address class imbalance issues. Figure 1 shows the proposed TLMS_ STO The structure of the algorithm in improving multiple source domain stages is mainly divided into two key steps: the first step is to generate a synthesized minority class of new samples in the feature space of the source domain, making the dataset of the source domain more balanced; The second step is to train classifiers from multiple source domains using the modified balanced dataset. The following describes each step in detail.

Fig. 1

Structure of OTLMS_STO algorithm in process of source domain.

Use ${(x_{i}, y_{i})}_{i = 1}^{L_{s_{i}}}$ to represent the dataset of the source domain. In the input space, Euclidean distance is used to identify neighborhoods, but since SVM operates in kernel space, a new distance metric needs to be defined to identify neighborhoods. For minority class samples x_m in the source domain, use feature mapping functions φ (x) Convert to φ (x_m), and then find the k-nearest neighbor of the sample. The distance between this sample and its neighbor x_n:

$\begin{matrix} d^{\emptyset} {(x_{m}, x_{n})}^{2} = {| | \emptyset (x_{m}) - \emptyset (x_{n}) | |}^{2} = \\ k (x_{m} - x_{m}) - 2 k (x_{m} - x_{n}) + k (x_{n}, x_{n}) \end{matrix}$ (5)

Where, k (· , ·) is a kernel function, Calculate the distance between seeds and neighbors through kernel functions without needing to know φ (x) The specific form of the function.

After obtaining the K-nearest neighbors of all minority class samples in the source domain, many sets of seeds and neighbor pairs are obtained. From them, an appropriate number of sample pairs are selected and a new sample is generated on the line segment between them. Number of new minority class samples generated L_t-new to make the current source domain’s class.

Distribute relatively evenly and assign a label to each new sample. Synthesize new samples in the feature space according to the following Equation (6). $\emptyset (x_{mn}) = \emptyset (x_{m}) + α^{mn} (\emptyset (x_{n}) - \emptyset (x_{m}))$ (6)

Where, αmn is a random number between 0 and 1, randomly generated during the equation use process, as set in the reference literature [28].

Note, that when the+1 labeled samples in the target domain are minority classes, it is not certain that the+1 labeled samples in each source domain are also minority classes. Therefore, when balancing source domain data, it is necessary to determine the number of classes based on the specific number of samples in the two categories.

The SVM classifier in the source domain can be trained through the Gram matrix K₁, which is composed of the inner product of each pair of samples in the source domain: $K_{1} = [\begin{matrix} \begin{matrix} k_{1} (x_{1}, x_{1}), k_{1} (x_{1}, x_{2}), \dots, k_{1} (x_{1}, x_{L_{S_{j}}}) \\ k_{1} (x_{2}, x_{1}), k_{1} (x_{2}, x_{2}), \dots, k_{1} (x_{2}, x_{L_{S_{j}}}) \end{matrix} \\ ⋮ \\ k_{1} (x_{L_{S_{j}}}, x_{1}), k_{1} (x_{L_{S_{j}}}, x_{2}), \dots, k_{1} (x_{L_{S_{j}}}, x_{L_{S_{j}}}) \end{matrix}]$ (7)

The generated L_t-new new samples are added to the Gram matrix K₁ to train the SVM classifier of the source domain. The new Gram matrix representation is expressed in Equation (8): $K = [\begin{matrix} K_{1} & K_{2} \\ K_{2}^{T} & K_{3} \end{matrix}]$ (8)

Where, each element $k (x_{i}, x_{j}^{mn})$ in K₂ is the inner product of the original sample and the synthesized sample in Equation (9): $\begin{matrix} k (x_{i}, x_{j}^{mn}) = \emptyset (x_{i}) \emptyset (x_{j}^{mn}) \\ = \emptyset (x_{i}) [\emptyset (x_{j}^{m}) + α^{mn} (\emptyset (x_{j}^{n}) - \emptyset (x_{j}^{m}))] \\ = (1 - α^{mn}) k (x_{i}, x_{j}^{m}) + α^{mn} k (x_{i}, x_{j}^{n}) \end{matrix}$ (9)

$K_{2}^{T}$ is the transposition of K₂, and each element $K (x_{i}^{lm}, x_{j}^{pq})$ in K₃ is the inner product of two synthesized samples in Equation (10): $\begin{matrix} K (x_{i}^{mn}, x_{j}^{pq}) = \emptyset (x_{i}^{mn}) \emptyset (x_{j}^{pq}) \\ = [\emptyset (x_{i}^{m}) + α^{mn} (\emptyset (x_{i}^{n}) - \emptyset (x_{i}^{m}))] \\ \times [\emptyset (x_{j}^{p}) + α^{pq} (\emptyset (x_{j}^{q}) - \emptyset (x_{j}^{p}))] \\ = (1 - α^{mn}) (1 - α^{pq}) k (x_{i}^{m}, x_{j}^{p}) \\ + (1 - α^{pq}) α^{mn} k (x_{i}^{n}, x_{j}^{p}) \\ + (1 - α^{mn}) α^{pq} k (x_{i}^{m}, x_{j}^{q}) \\ + α^{pq} α^{mn} (x_{i}^{n}, x_{j}^{q}) \end{matrix}$ (10)

According to Equations (9) and (10), it can be seen that the augmented kernel matrix K is only composed of training samples in the source domain and the kernel function k (·, ·), without the need to know the mapping function φ (x) The specific form of. Therefore, any effective kernel function can be used to train source domain SVM, while the proposed OTLMS_STO algorithm uses Gaussian kernel functions to train SVM.

2.4 Oversampling in the feature space of the target domain

This section mainly introduces the processing steps of the proposed OTLMS_STO algorithm for imbalanced target domains. The target domain is trained using the PA algorithm, which also has an optimization problem similar to SVM. The prediction mechanism is based on a hyperplane, which divides the instance space into two half spaces. In the stage of improving the objective domain function, the objective decision function can utilize the same kernel techniques as SVM classifiers to synthesize samples using dot products in the feature space without the need to know the feature mapping function φ (x). Therefore, the new samples generated in the source and target domains can be controlled to be in the same feature space through the same kernel function and bandwidth. The data points generated in the target domain have better linear separability in high-dimensional space and can be used to improve the target decision function.

Figure 2 shows the proposed OTLMS_STO algorithm improves the structure of the target domain stage. The samples in the target domain arrive in multiple batches. When the target domain arrives at a batch of data, the processing process is divided into three steps: the first step is to oversampling a small number of samples in the current batch, so that the class distribution is relatively balanced. In Fig. 2, ${(x_{i}, y_{i})}_{i = 1}^{L_{i}}$ is the original sample, ${(x_{i}, y_{i})}_{i = 1}^{L_{t_{n} ew}}$ represents the synthesized new sample. The second step is to traverse the generated new samples and sequentially train the objective decision function g^T (x). Step 3, use the current batch Multi source online transfer learning based on the original samples in. By using the same three steps for all batches, the final trained integration function can be obtained. The following describes each step in detail.

Fig. 2

Structure of OTLMS_STO algorithm in process of target domain.

When the sample ${(x_{i}, y_{i})}_{i = 1}^{L_{i}}$ of the t-th batch in the target domain arrives, OTLMS_STO algorithm will select all minority class samples from it. Then, search for the k-nearest neighbors of each minority class sample in the current arriving batch from multiple batches that have already arrived. For minority seeds in the current batch ∅ (x_m) and the minority class neighbors in the previous batch ∅ (x_n), Equation (5) is used to calculate the distance between the two in the feature space. And use ${(x_{i}, y_{i})}_{i = 1}^{L_{i}^{\min} \times k}$ represents the set of sample pairs composed of seeds and neighbors, with a total of $L_{i}^{\min} \times k$ pairs, assign+1 labels to each pair of samples at the same time. Then randomly select min from the set_ Synthesize new samples in the feature space based on Equation (6) for num minority class sample pairs. Among them, min_max The size of num should be such that the number of minority and majority class samples in the current batch is approximately equal, that is, the data categories should be balanced.

Before conducting multi-source online transfer learning on the current batch of samples, the objective decision function g^T (x) is improved with the new samples generated. However, the new minority class samples generated according to Equation (6) utilize commonly unknown feature mapping functions φ (x) Therefore, the new synthesized sample φ (xmn) cannot be obtained specifically. The objective decision function adopts the PA algorithm, which calculates the inner product of two samples through the kernel function each time to add support vectors, thereby improving the objective function. Therefore, when the objective function receives new samples generated in the feature space, the inner product of the ordinary sample and the composite sample can be calculated according to Equation (9), and the inner product of the composite sample and the composite sample can be calculated according to Equation (10), thereby using the new samples to train the objective decision function. And modification.

The stage of entering the source domain is similar, where only the training samples and kernel function k (·, ·) need to be known, without the need to know the mapping function φ (x) The specific form of.

Using composite instances to improve the target domain decision function, when the hinge loss is greater than 0, the composite instance is added as a support vector to the support vector set, while also maintaining the separability of the feature space, i.e. $g_{t}^{T} (x) = g^{T} (x) + \sum_{i = 1}^{l_{l_{n} ew}} τ_{i}^{mn} y_{i} k (x_{i}^{mn}, x)$ (11)

Theorem 1. Generating composite minority class samples in the feature space of the target domain can also ensure class separability.

Prove that the objective domain function consists of support vectors, which can be expressed in Equation (12): $g^{T} (x) = \sum_{j = 1}^{N} τ_{j} y_{j} k (x, x_{j})$ (12)

Assume that the current batch of samples ${(x_{i}, y_{i})}_{i = 1}^{l_{t}}$ is linearly separable in the feature space of the target domain, so that Equation (13) can be obtained: $y_{i} g^{T} (x_{i}) = y_{i} \sum_{j = 1}^{N} τ_{j} y_{j} k (x, x_{j}) ⩾ 0, i \in 1, 2, . . . . . . . l_{t}$ (13)

Substituting the minority class sample φ (x^pq) generated by Equation (6) into the objective function to get Equation (14): $\begin{matrix} g^{T} (x^{mn}) = \sum_{j = 1}^{N} τ_{j} y_{j} \emptyset {(x^{mn})}^{T} \emptyset (x_{j}) \\ = (1 - α^{mn}) \sum_{j = 1}^{N} τ_{j} y_{j} k (x_{m}, x_{j}) + α^{mn} \sum_{j = 1}^{N} τ_{j} y_{j} k (x_{n}, x_{j}) \\ = (1 - α^{mn}) g^{T} (x^{m}) + α^{mn} g^{T} (x^{n}) ⩾ 0 \end{matrix}$ (14)

Where, g^T (x^m) ⩾ 0, g^T (xⁿ) ⩾ 0, x^m, xⁿ belong to the minority class, α^mn ∈ [0, 1].

Therefore, the samples generated in the feature space of the target domain can also ensure that the categories are separable. Each batch of new samples generated will optimize the hyperplane of the objective function in the feature space to improve the performance of the objective function.

Then conduct multi-source online transfer learning on all samples in the current batch to get the final results of this batch.

2.5 Algorithm description and complexity analysis

Proposed OTLMS_STO algorithm is divided into two stages: (1) improving classifiers in multiple source domains; (2) The target domain classifier is improved, and the improved source classifier is used for multi-source online transfer learning.

Algorithm description and complexity analysis in the first stage:

Input: m source domain datasets D^s ={ D^S1, D^s2, . . . , D^{s
_m} }

Output: m improved source classifiers (g^s1, g^s2, . . . , g^{s
_m})

Step 1. Identify minority categories in each source domain dataset

Step 2. For loop traversal processing of datasets from m source domains

Step 2.1 Find all minority class samples ${(x_{i}, y_{i})}_{i = 1}^{L_{t}^{\min}}$ in the source domain

Step 2.2 For loop traversal ${(x_{i}, y_{i})}_{i = 1}^{L_{t}^{\min}}$

Find the k-nearest neighbors of each minority class sample, forming seed pairs of k samples x _m and neighbor x _n

Step 2.3 Generate new minority class samples according to Equation (6) to balance the distribution of categories in the current source domain

Step 2.4 Augmented Gram Matrix K Using Original and New Samples to Calculate Equation (8)

Step 2.5 Using Matrix K to Train the Source Domain Classifier g^si (x)

Step 3. Output m improved source classifiers (g^s1, g^s2, . . . , g^{s
_m})

In the above algorithm, Step 2.1 finds all minority class samples with a time complexity of O(N), where N is the total number of samples in the current source domain. The time complexity of finding k-nearest neighbors for all minority class samples in step 2.2 is O(n_min²), n_ Num is the number of minority class samples in the current source domain. The time complexity of calculating the Gram matrix in step 2.4 is O((N + n_num)²d), where d is the dimension of the sample. Therefore, the total time complexity is O(n(N + n_min²+(N + n_num)²d)), where n is the number of source domains and can be approximated as O(nd (N + n_num) ²).

Algorithm description and complexity analysis in the second stage:

Input: Improved source classifier) (g^s1, g^s2, . . . , g^{s
_m}), initial compromise parameter C, weight discount parameter β ∈ (0, 1)

Initialization: g^T (x) =∅, v1 = v2 = ...= v^m =1/(m + 1), w = 1/(m + 1).

Step 1. For each batch in the target domain of the loop.

Step 1.1 Search for k-nearest neighbors of minority samples in the current batch, consisting of seed and neighbor pairs ${(x_{i}, y_{i})}_{i = 1}^{L_{t}^{\min} \times K}$ .

Step 1.2 Randomly select min from minority sample pairs_ Number num, generate new samples according to Equation (6).

Step 1.3 New samples generated by the For loop.

Calculate parameters before loss L and support vector $T_{i}^{mn} = \min {C, L_{i} / k (x_{i}^{mn}, x_{i}^{mn})}$ .

When the loss is greater than 0, update the objective decision function according to Equation (2), where the kernel function is obtained from Equations (9) and (10).

Step 1.4 For loop each instance of the current batch.

Step 1.4.1 Update weights according to Equation (3)

Step 1.4.2 Using Improved Source Classifier to Predict Examples

Step 1.4.3 Obtain the results of the target domain according to Equation (4)

Step 1.4.4 Obtaining Final Results through Weighting

Step 2 . Output trained integrated decision functions (4)

In the above algorithm, the time complexity of finding k-nearest neighbors in step 1.1 is O(3m₁m₂d), where m₁ and m₂ are the minority and majority classes in the current and previous batches, respectively, and d is the dimension of the sample. Step 1.3 Improve the time complexity of the objective decision function using synthesized samples with O(4svd), where s is the total number of new samples and v is the number of support vectors.

Step 1.4 Train the current batch of raw samples with a time complexity of O(2nvd), totaling n samples. There are a total of N batches in the entire target domain, with a total time complexity of O(N(3m₁m₂d+4svd+2nvd)), which can be approximated as O(N (m₁m₂d+4svd+2nvd)).

3 Experiment and performance analysis

The OTLMS_STO proposed in this chapter algorithm was compared with multiple baseline algorithms for online learning, and experiments were conducted on real-world datasets: the 20 Newsgroups dataset, Office Home dataset, Modern Office-31 dataset, and DomainNet dataset. In order to obtain reliable results, with the same parameter settings, data from multiple source domains is used as training data, and data from the target domain is used as test data. Each experiment is repeated 10 times by changing the arrival order of test instances. The results show that the proposed algorithm achieves better performance than the baseline algorithm.

3.1 Dataset introduction

The source domain dataset is the original dataset used to train the model, usually collected from the same domain. For example, in image classification tasks, the source domain dataset can be a set of well labeled photos taken from the same camera, location, and lighting conditions. The target domain dataset is a dataset used for testing models, typically collected from different domains. For example, in the same image classification task, the target domain dataset can be photos taken from different cameras, locations, and lighting conditions. In domain adaptation and Transfer learning, the goal of the target domain dataset is to evaluate the generalization ability of the model, that is, the performance ability of the model in the new domain. The purpose of the source domain dataset is to train the model to adapt to the target domain dataset and improve its performance.

•20Newsgroups

20Newsgroup Datasets (http://qwone.com/ Jason/20Newsgroups/) is a popular dataset for text applications in machine learning technology, which collects approximately 20000 newsgroup documents and is divided into an average of 20 newsgroups with different themes. Among them, each newsgroup corresponds to a different topic, some newsgroups have very close connections with each other, while others are highly unrelated. Highly relevant news groups consisting of five major themes, such as OS, IBM, Mac, and X, are comp themed news groups, while crypt, electronics, med, and space are sci themed news groups. In the experiment, the news groups in the comp topic were marked as positive examples, while the news groups in the sci topic were marked as negative examples. Thus, four related learning domains can be constructed: os_ vs_ crypt, ibm_ vs_ electronics, mac_ vs_ Med and x_ vs_ space. Select one domain randomly as the target domain and the other three domains as the source domain to generate four transfer learning tasks. The imbalance rate for each group of tasks is 0.3.

•Office-Home

The Office Home dataset contains approximately 15500 images from four different neighborhoods, including Art, Clipart, Product, and Real World images [29]. Each domain contains a graph of 65 categories

Like. In the experimental setup, images from the Real World domain are used as the target domain, and the Art, Clipart, and Product domains are used as the source domain. In 65 categories of the Real World domain, select one target domain with less samples and one target domain with more samples to form a secondary task, and select the same category for the three source domains to form a transfer learning task. Before the experiment, perform simple preprocessing on the original images in the task, processing each image into a 1×A vector of 10 000. Experiment total 33 sets of transfer learning tasks were generated. Among the 33 sets of tasks, there is 1 set of tasks in the RealWorld domain with an imbalance rate between [0.1, 0.2), 14 sets of tasks with an imbalance rate between 0.2, 0.3), and 18 sets of tasks with an imbalance rate between 0.3, 0.4).

•DomainNet

The DomainNet dataset is the largest domain adaptation dataset to date, consisting of 6 different domains, 345 categories, and approximately 600000 images. The six domains are Clipart, Infograph, Painting, Quickdraw, Real, and Sketch, while the categories range from furniture, fabric, electronics to mammals, architecture, and more. In the experiment, one class with fewer samples and one class with more samples is selected from Real photos and real world images to form a target domain, and the other five domains are used as source domains to form a transfer learning task. A total of 45 groups of transfer learning tasks were generated in the experiment. Among the 45 groups of tasks, there are 5 groups of tasks in the Real domain with imbalanced rates between 0, 0.1), 7 groups of tasks with imbalanced rates between 0.1, 0.2), and 33 groups of tasks with imbalanced rates between 0.2, 0.3).

•Modern Office-31

The Modern Office-31 dataset is a transfer learning dataset for image classification 30]. It contains subsets of four fields: Amazon (A), Webcam (W), Synthetic, and Dslr (D), divided into 31 categories, with a total of 7210 images. In Modern Office-31 data

Centralized, not only the total number of samples in each field is different, but also the distribution of categories within each field is unbalanced. Modern Office-31 data sets can be processed through unbalanced methods to improve the effect of transfer learning. In the experiment, each image in the preprocessed dataset was 1×A vector of 10 000. Use Webcam as the target domain and the other three domains as the source domain. Then select a category with more samples and a category with less samples in Webcam to form a group of transfer learning tasks, and a total of 20 groups of tasks are generated. Among the 20 tasks, 5 tasks in the Webcam domain have an imbalance rate between [0.2 0.3), 9 tasks have an imbalance rate between 0.3 0.4) rand 6 tasks have an imbalance rate between 0.4 0.5).

3.2 Baseline algorithm and evaluation indicators

To evaluate the performance of the proposed OTLMS_STO algorithm, it was compared with several latest online learning methods through experiments. The PA algorithm is a classic online learning algorithm [26], and using PA as a comparison algorithm does not require knowledge transfer. Using data from various source domains to initialize PA first, a variant of the PA algorithm called “PAIO” is implemented. At the same time, it is also compared with HomOTLMS [20], a famous multi-source online transfer learning algorithm, which can use the useful knowledge of multiple source domains to improve the classification performance of the target domain. In addition, the proposed algorithm will be combined with OTLMS_IO [22] and OTLMS_FO [26] is compared. Both algorithms improve performance by oversampling the unbalanced target domain. The former samples in the input space, and the latter samples in the feature space. All algorithms are implemented in Python language.

Dataset introduction evaluates the performance of classifiers on imbalanced datasets, but using a single evaluation criterion such as accuracy or error rate is usually ineffective. This experiment uses accuracy and G-mean to evaluate the performance of the dataset, which can evaluate the model performance of imbalanced data. When the samples are all divided into the same class, the value of G-mean is 0. Table 1 shows the binary confusion matrix. The calculation equation of G-mean is Equation (15):

Table 1
Two-classification confusion matrix

Truth Forecast results

Positive example Counterexample

Positive example TP FN

Counterexample FP TN

Truth	Forecast results
Positive example	TP	FN
Counterexample	FP	TN

$G - mean = \sqrt{\frac{TN}{TN + FP} + \frac{TP}{TP + FN}}$ (15)

3.3 Parameter settings and experimental results

3.3.1 Parameter settings

On the 20Newsgroups, Office Home, DomainNet, and Mo dern Office-31 datasets, the proposed OTLMS_STO algorithm is compared with four baseline algorithms of transfer learning. In order to make the comparison more fair, all algorithms have adopted experimental settings that are as similar as possible. For k-nearest neighbors of minority samples in each batch, OTLMS_ STO will automatically set a k value to ensure that the generated few new samples of the same class can achieve a relatively balanced distribution of the current batch’s categories. Due to the widespread application of Gaussian kernel functions, this article uses Gaussian kernel training functions. The algorithm proposed in this article can also use other kernel functions and search for the optimal bandwidth σ in the range of [10^-2, 10²].

In the experiment in section 3.3.7, the impact of different compromise parameter C values on experimental performance was analyzed, and the compromise parameter C of all algorithms on all datasets was set to 5. According to the analysis of the algorithm error bound in reference [20], the weight discount parameter can be obtained β=M/(m + ln (n + 1)), where m is calculated as The number of errors made by the method, n is the number of source classifiers.

3.3.2 Experimental results on the 20Newsgroups dataset

Table 2 lists the performance of various comparison algorithms on the 20Newsgroups dataset, with evaluation metrics including accuracy and G-mean. From the experimental results, it can be observed that the proposed OTLMS_STO algorithm achieved better performance than all baseline algorithms in four sets of learning tasks. The performance of OTLMS_STO algorithm is superior to PA and PAIO, indicating that the proposed algorithm can effectively extract knowledge from multiple source domains. The proposed OTLMS_STO algorithm in 4 sets of tasks performs better than HomeOTLMS because it ignores the issue of imbalanced data categories between the source and target domains. The performance of algorithm OTLMS_IO and OTLMS_FO are superior to HomeOTLMS, but both comparison algorithms only consider expanding samples in the target domain, while the proposed OTLMS_STO algorithm amplifies minority class samples in the feature space of the source and target domains. Figure 3 shows a line chart showing the error rates of different algorithms in four groups of tasks changing with the increase of the number of samples. From Fig. 3, it can be seen that as the number of training samples increases, the error rates of the six algorithms also significantly decrease. And OTLMS_STO algorithm in os_vs_crypt, mac_vs_med and x_vs_space task of is always better than that of comparison methods.

Fig. 3

Error rate of each algorithm on 20Newsgroups dataset with increase of the number of samples.

Table 2

Results of different learning algorithms on 20 Newsgrops dataset (mean±standard deviations) unit: %

Method	os_vs_crypt		ibm_vs_electronics		mac_vs_med		x_vs_space
	accuracy	G-mean	Accuracy	G-mean	accuracy	G-mean	accuracy	G-mean
PA	85.49±0.62	82.52±0.76	74.47±0.97	68.62±1.25	79.97±0.64	75.66±0.81	82.16±0.83	78.38±1.04
PAIO	85.74±0.91	82.56±1.12	74.88±1.12	68.97±1.44	80.29±0.45	75.88±0.56	82.19±0.58	78.30±0.73
HomOTLMS	86.99±0.54	79.14±1.16	80.49±0.58	73.70±1.02	80.30±0.58	64.72±1.30	82.67±0.51	69.75±0.88
OTLMS_IO	87.76±0.72	80.30±1.12	81.30±0.83	75.32±1.23	81.52±0.89	67.14±1.65	83.51±1.07	71.30±1.98
OTLMS_FO	89.66±0.42	83.03±1.15	80.80±0.77	74.18±0.84	83.46±0.50	70.77±1.30	85.53±0.37	74.90±0.66
OTLMS_STO	91.36±0.59	86.62±1.12	81.82±0.66	80.91±0.86	86.30±0.46	77.72±0.93	87.87±0.67	80.08±0.95

Low error rate. Among them, HomeOTLMS, OTLMS_IO, OTLMS_FO and OTLMS_STO algorithm has better results when the initial sample size is small, which proves that the above algorithms can effectively extract knowledge from multiple source domains. The error rate of the OTLMS_STO algorithm proposed in this article is lower than other algorithms on most tasks, proving that the proposed algorithm can effectively improve the imbalanced source and target domains.

3.3.3 Experimental results on the Office Home dataset

33 sets of experimental tasks were conducted on the image dataset Office Home, and Table 3 shows the numerical results of all comparative algorithms on two indicators. Among them, HomeOTLMS, OTLMS_IO, OTLMS_FO and OTLMS_STO algorithm has better performance than ordinary online learning algorithms, indicating that transferring knowledge from multiple source domains helps predict the target domain. The evaluation of OTLMS_STO, OTLMS_IO and OTLMS_FO is better than that of HomeOTLMS, as the first three algorithms consider the imbalance of target domain categories.

Table 3
Results of different learning algorithms on Office-Home dataset(mean±standard deviations) unit: %

method task 5 task 12 task 19 task 27

accuracy G-mean accuracy G-mean accuracy G-mean accuracy G-mean

PA 61.26±0.70 53.41±1.02 68.29±0.26 65.01±0.16 62.63±0.80 56.86±1.06 76.76±1.18 73.98±1.41

PAIO 56.47±0.91 44.62±1.57 72.31±0.42 65.37±0.24 65.76±0.86 56.96±1.12 75.41±1.18 69.30±1.45

HomOTLMS 71.68±0.76 63.28±1.02 84.27±0.78 81.70±0.95 73.64±0.80 61.93±1.17 81.49±1.18 71.98±1.51

OTLMS_IO 68.07±1.64 62.43±1.32 84.53±0.71 82.15±0.90 74.41±1.65 65.95±1.55 82.84±1.02 74.74±1.10

OTLMS_FO 73.36±1.72 66.18±1.92 85.13±0.95 83.18±1.32 73.47±1.14 64.84±1.70 82.36±1.19 73.71±1.55

OTLMS_STO 76.47±1.55 70.79±1.57 87.69±0.78 88.24±0.91 76.86±1.08 71.86±1.92 84.19±1.18 76.31±1.47

method	task 5	task 12	task 19	task 27
PA	61.26±0.70	53.41±1.02	68.29±0.26	65.01±0.16	62.63±0.80	56.86±1.06	76.76±1.18	73.98±1.41
PAIO	56.47±0.91	44.62±1.57	72.31±0.42	65.37±0.24	65.76±0.86	56.96±1.12	75.41±1.18	69.30±1.45
HomOTLMS	71.68±0.76	63.28±1.02	84.27±0.78	81.70±0.95	73.64±0.80	61.93±1.17	81.49±1.18	71.98±1.51
OTLMS_IO	68.07±1.64	62.43±1.32	84.53±0.71	82.15±0.90	74.41±1.65	65.95±1.55	82.84±1.02	74.74±1.10
OTLMS_FO	73.36±1.72	66.18±1.92	85.13±0.95	83.18±1.32	73.47±1.14	64.84±1.70	82.36±1.19	73.71±1.55
OTLMS_STO	76.47±1.55	70.79±1.57	87.69±0.78	88.24±0.91	76.86±1.08	71.86±1.92	84.19±1.18	76.31±1.47

But OTLMS_STO algorithm has better performance. It can simultaneously expand a few samples from the core space of the source domain and the target domain, effectively modify the hyperplane in the feature space, and clearly see the changes of the classifier from the G-mean index. Figure 4 shows the histogram of the accuracy rate of the three main algorithms on 33 groups of tasks, and Fig. 5 shows the line chart chart of G-mean indicators of 33 groups of tasks. On the vast majority of tasks, OTLMS_STO algorithm should have better performance and better performance for a few classes. This indicates that the proposed algorithm can not only transfer knowledge from multiple source domains, but also effectively cope with imbalanced datasets.

Fig. 4

Accuracy of 33 groups of tasks on Office-Home dataset.

Fig. 5

G-mean of each group of tasks on Office-Home.

3.3.4 Experimental results on the DomainNet dataset

To better validate the performance of the OTLMS_STO algorithm, a total of 60 experimental tasks were conducted on the image dataset DomainNet. Table 4 presents the numerical results of four sets of tasks, and the data in the experimental results clearly support the proposed method, achieving optimal performance beyond the comparison algorithm in all tasks. This indicates that the proposed OTLMS_STO algorithm can extract effective knowledge from multiple source domains and has good results in cases of imbalance between the source and target domains. The DomainNet dataset contains a total of 5 source domains. When combining source and target domains, the proportion of target domains is only 1/6, so OTLMS_FO improves the performance of the objective decision function by amplifying samples in the target domain. And the proposed OTLMS_STO algorithm can synthesize minority class samples in the kernel space of the source domain, and then train the source domain classifier using an augmented kernel matrix. By combining multiple source classifiers and target classifiers, better performance can be achieved. Affected by spatiality and observability, Fig. 6 shows PA, HomeOTLMS, and OTLMS_STO algorithm achieved results in 45 sets of tasks, while ignoring the results of other algorithms. In most tasks, the proposed algorithm outperforms the two comparison algorithms. Figure 7 shows the G-mean values of three main algorithms, and the results indicate that the proposed OTLMS_STO algorithm can handle imbalanced data, especially with better performance for datasets with a large number of source domains.

Fig. 6

Accuracy of 45 groups of tasks on DomainNet dataset.

Fig. 7

G-mean of each group of tasks on DomainNet dataset.

Table 4

Results of different learning algorithms on DomainNet dataset (mean±standard deviations) unit: %

method	task 1		task 9		task 15		task 23
	Accuracy	G-mean	Accuracy	G-mean	Accuracy	G-mean	Accuracy	G-mean
PA	72.53±2.53	61.99±4.59	76.29±1.34	70.23±1.83	79.56±0.82	76.60±1.07	82.62±1.03	61.03±2.64
PAIO	72.78±2.61	62.06±3.88	76.47±1.17	69..87±1.81	79.33±1.05	75.82±1.21	82.79±0.93	61.83±2.31
HomOTLMS	77.41±0.67	35.43±5.95	77.80±0.71	53.68±1.74	82.49±0.53	71.75±1.08	84.93±0.76	74.47±1.80
OTLMS_IO	77.44±0.18	34.75±4.84	79.88±0.40	58.59±1.71	83.44±0.42	73.78±0.60	86.93±0.60	76.20±1.35
OTLMS_FO	77.78±0.84	40.28±7.36	78.49±0.52	55.97±1.05	82.66±0.57	72.96±0.84	85.92±0.55	80.73±1.20
OTLMS_STO	80.24±0.89	68.12±2.85	82.40±0.60	73.81±1.24	85.03±0.43	80.80±0.66	88.98±0.71	85.44±1.50

3.3.5 Experimental results on the Modern Office-31 dataset

A total of 20 experimental tasks were conducted on the Modern Office-31 image dataset. Table 5 presents the accuracy and G-mean numerical results of using all algorithms on several randomly selected tasks. Book

Table 5
Results of different learning algorithms on Modern Office-31 dataset (mean±standard deviations) unit: %

Method task 4 task 9 task 13 task 17

Accuracy G-mean Accuracy G-mean Accuracy G-mean Accuracy G-mean

PA 91.67±3.01 89.14±5.18 65.93±4.64 60.56±6.04 90.00±2.24 91.52±2.44 83.44±2.63 83.88±3.03

PAIO 92.59±0.00 88.32±0.00 69.32±2.20 63.55±2.76 92.26±1.97 92.01±2.06 86.09±3.46 85.37±3.80

HomOTLMS 96.30±1.19 97.06±1.72 69.32±2.78 64.57±7.34 87.92±2.42 88.31±2.96 79.69±3.83 82.28±4.45

OTLMS_IO 96.67±1.39 92.13±2.87 70.43±2.43 50.47±6.74 88.11±2.22 88.04±2.84 82.03±4.15 76.76±4.87

OTLMS_FO 96.85±1.17 97.38±1.57 71.53±3.20 69.16±6.26 88.49±1.78 88.63±1.90 82.19±3.29 84.25±3.94

OTLMS_STO 98.15±0.83 98.99±1.76 84.92±2.33 85.36±2.83 96.98±1.51 98.79±1.12 91.88±2.19 92.89±2.39

Method	task 4	task 9	task 13	task 17
PA	91.67±3.01	89.14±5.18	65.93±4.64	60.56±6.04	90.00±2.24	91.52±2.44	83.44±2.63	83.88±3.03
PAIO	92.59±0.00	88.32±0.00	69.32±2.20	63.55±2.76	92.26±1.97	92.01±2.06	86.09±3.46	85.37±3.80
HomOTLMS	96.30±1.19	97.06±1.72	69.32±2.78	64.57±7.34	87.92±2.42	88.31±2.96	79.69±3.83	82.28±4.45
OTLMS_IO	96.67±1.39	92.13±2.87	70.43±2.43	50.47±6.74	88.11±2.22	88.04±2.84	82.03±4.15	76.76±4.87
OTLMS_FO	96.85±1.17	97.38±1.57	71.53±3.20	69.16±6.26	88.49±1.78	88.63±1.90	82.19±3.29	84.25±3.94
OTLMS_STO	98.15±0.83	98.99±1.76	84.92±2.33	85.36±2.83	96.98±1.51	98.79±1.12	91.88±2.19	92.89±2.39

The OTLMS_STO algorithm proposed in the article enhances the classification performance of the target domain by utilizing useful information from multiple source domains. Therefore, in terms of accuracy indicators, OTLMS_STO achieves competitive performance. Meanwhile, OTLMS_STO amplifies minority class samples in the feature space of the source and target domains, while improving the functions of the source and target domains to avoid the final integrated decision function leaning towards the majority class. From Table 5, it can be observed that OTLMS_STO algorithm on G-mean metric Achieved optimal performance.

Figure 8 shows the average accuracy results of 20 experimental tasks on the Modern Office-31 dataset in PA, HomeOTLMS, and OTLMS_STO algorithms. From the figure, it can be seen that the proposed OTLMS_STO algorithm has optimal performance in the vast majority of tasks, which proves that the proposed algorithm can effectively utilize knowledge from the source domain to improve performance, and proves the effectiveness of expanding samples in both the feature spaces of the source and target domains on functional performance. Figure 9 shows the G-mean results of 20 experimental tasks on PA, HomOTLMS, and OTLMS STO algorithms, demonstrating the effectiveness of OTLMS STO in dealing with imbalanced data.

Fig. 8

Accuracy of 20 groups of tasks on Modern Office-31 dataset.

Fig. 9

G-mean of each group of tasks on Modern Office-31 dataset.

3.3.6 Rank value of accuracy on all datasets

Table 6 presents the rank values of accuracy for a total of 102 experimental tasks on all three datasets, as well as the average rank values for each dataset. In the accuracy ranking of the five algorithms, the first rank value is 1, the second rank value is 2, and so on. For the 20 Newsgroups dataset, task1–4 represents tasks 1, 2, and Task 3 and Task 4, followed by 1–11, are the rank values of tasks 1–4. From the table, it can be seen that in the vast majority of tasks, the experimental results of the proposed OTLMS_STO algorithm rank first, and the average rank value also performs well.

Table 6
Rank value average rank value of task accuracy in each group

Dataset Task Rank value Average rank value

20Newgroups Task1–4 1 1 1 1 1.00

Office-Home Task1–7 2 1 1 1 1 1 1 1.24

Task8–14 1 1 1 2 1 2 1

Task15–21 2 1 2 2 1 1 2

Task22–28 1 1 1 1 1 1 1

Task29–33 1 1 1 2 1

DomainNet Task1–8 1 1 1 1 1 2 2 1 1.13

Task9–16 1 1 1 1 1 1 1 1

Task17–24 1 1 1 1 1 2 1 1

Task25–32 1 1 1 1 1 1 1 1

Task33–40 1 2 1 1 2 1 1 2

Task41–45 1 1 1 1 1

Modern Office-31 Task1–7 1 1 1 1 1 1 1 1.00

Task8–14 1 1 1 1 1 1 1

Task15–20 1 1 1 1 1 1

Dataset	Task	Rank value	Average rank value
Office-Home	Task1–7	2 1 1 1 1 1 1	1.24
	Task8–14	1 1 1 2 1 2 1
	Task15–21	2 1 2 2 1 1 2
	Task22–28	1 1 1 1 1 1 1
	Task29–33	1 1 1 2 1
DomainNet	Task1–8	1 1 1 1 1 2 2 1	1.13
	Task9–16	1 1 1 1 1 1 1 1
	Task17–24	1 1 1 1 1 2 1 1
	Task25–32	1 1 1 1 1 1 1 1
	Task33–40	1 2 1 1 2 1 1 2
	Task41–45	1 1 1 1 1
Modern Office-31	Task1–7	1 1 1 1 1 1 1	1.00
	Task8–14	1 1 1 1 1 1 1
	Task15–20	1 1 1 1 1 1

3.3.7 parameter adjust

The method proposed in this article involves some adjustable parameters, including the compromise parameter C. Figure 10 shows the potential impact of different C values on the 20Newsgroups dataset. From the graph, it can be observed that the accuracy of OTLMS_STO and other methods varies significantly with different C. For the same task, different algorithms achieve the best performance at different C values. From Fig. 10, it can be concluded that under different C values, OTLMS_STO algorithm is more accurate and stable than other transfer learning algorithms, which verifies the effectiveness of the proposed algorithm. In the experiment, set the C value of all algorithms to 5.

Fig. 10

Evaluation of all algorithms with different C values on 20Newsgroups dataset.

3.4 Time cost

Quantitative analysis is an essential tool for evaluating the performance of algorithms and systems. In this article, Python is used to implement their algorithm and tested it on multiple tasks. They recorded the average running time of the algorithm and summarized it in Fig. 11. From the graph, it can be observed that as the number of samples increases, the average running time of OTLMS_STO algorithm is more expensive than other algorithms.

Fig. 11

Time cost each algorithm with increase of the number of samples.

A Windows machine with a 6×2.6 GHz CPU processor and 16 GB of memory are used to run their experiments. The machine specifications are important to consider because they can affect the performance and running time of the algorithm.

4 Conclusion

Multi-source information fusion is a sophisticated estimating technique that enables users to analyze more precisely complex situations by successfully merging key evidence in the vast, varied, and occasionally contradictory data obtained from various sources. Restricted by the data collection technology and incomplete data of information sources, it may lead to large uncertainty in the fusion process and affect the quality of fusion. Reducing uncertainty in the fusion process is one of the most important challenges for information fusion. This paper considers the online transfer learning problem of unbalanced data, where the data in the target domain arrives in batches and knowledge is migrated from multiple offline source domains. For imbalanced source domains, this algorithm applies.

Expand minority class samples in the feature space to balance the source domain categories, and then use the augmented kernel matrix to train the source domain to form multiple improved offline source domain classifiers. For the imbalanced target domain, this algorithm searches for k-nearest neighbors of the minority classes in the current batch sample from the minority class samples in the previous arrival batch, and then improves the objective function using the synthesized new sample. Finally, several improved source classifiers and target classifiers are combined for multi-source online transfer learning, and extensive experiments are carried out on text and image data sets. The experimental results show that the proposed algorithm can not only effectively transfer knowledge from multiple source domains, but also effectively cope with the uneven distribution of data categories in the source and target domains. This article investigates the binary classification problem of imbalanced source and target domains. Multi class classification problems are more challenging, and offline and online objective functions need to consider multiple classes and the situation of imbalanced classes within them simultaneously Condition. In the future, we will continue to study the multi classification and multi source online transfer learning problem of unbalanced source domain and target domain.

Footnotes

Acknowledgments

This work was supported by 2023 Social Science Achievements Evaluation Committee of Hunan Province General funding project: Research on the Construction of Mainstream Ideological identity mechanism of Higher vocational students under the background of “Intelligent communication” (Project No: XSP2023FXZ036).

References

Peilin

, Steven

C.H.H.

, Jialei

et al., Online transfer learning, Artificial Intelligence 216 (2014), 76.

H.R.

, Yan

Y.G.

, Ye

Y.Z.

et al., Online heterogeneous transfer learning by knowledge transition, ACM Transactions on Intelligent Systems and Technology 10(3) (2019), 1–19. doi: 10.1145/3309537.

Pan

S.J.

and Yang

, A survey on transfer learning, IEEE Transactions on Knowledge and Data Engineering 22(10) (2010), 1345–1359.

Jie

, Vahid

, Pend

et al., Transfer learning using computational intelligence: a survey, Knowledge-Based Systems 80 (2015), 14–23.

Zhao

P.F.

, Li

Y.L.

and Lin

, Research progress of intention recognition for transfer learning, Technology 14(8) (2020), 1261–1274.

Ren

, Liu

B.S.

and Sun

J.Y.

, Research progress of cross domain recommendation algorithms for knowledge transfer, Journal of Frontiers of Computer Science and Technology 14(11) (2020), 1813–1827.

Dai

W.Y.

, Yang

, Xue

G.R.

et al., Boosting for transfer learning, Proceedings of the 24th International Conference on Machine learning, Corvallis, Jun 20–24, New York: ACM, 2007:193–200.

Long

, Wang

, Ding

et al., Adaptation regularization: a general framework for transfer learning, IEEE Transactions on Knowledge and Data Engineering 26(5) (2014), 1076–1089.

Yao

and Doretto

, Boosting for transfer learning with multiple sources, Proceedings of the 23rd IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, Jun 13–18, Washington: IEEE Computer Society, 2010:1855–1862.

10.

Amini

M.R.

, Usunier

and Goutte

, Learning from multiple partially observed views – an application to multilingual text categorization, Proceedings of the 23rd Annual Conference on Neural Information Processing Systems, Vancouver, Dec 7–10, Red Hook: Curran Associates, 2009:28–36.

11.

Eaton

, Selective transfer between learning tasks using task-based boosting, Proceedings of the 25th AAAI Conference on Artificial Intelligence, Menlo Park: AAAI Press, 2011:337–342.

12.

Dredze

, Kulesza

and Crammer

, Multi-domain learning by confidence-weighted parameter combination, Machine Learning 79(1/2) (2010), 123–149.

13.

Peng

X.C.

, Bai

Q.X.

, Xia

X.D.

et al., Moment matching for multi-source domain adaptation, Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27– Nov 2, Piscataway: IEEE, 2019:1406–1415.

14.

Hoffman

, Mohri

and Zhang

N.S.

, Algorithms and theory for multiple-source adaptation, Proceedings of the Annual Conference on Neural Information Processing Systems, Montréal, Dec 3–8, 2018:8256–8266.

15.

Yan

Y.G.

, Wu

Q.Y.

, Tan

M.K.

et al., Online heterogeneous transfer by hedge ensemble of offline and online decisions, IEEE Transactions on Neural Networks and Learning Systems 29(7) (2018), 3252–3263.

16.

Sun

, Tan

W.A.

, Xie

et al., Online learning method for performance prediction of large scale services, Journal of Frontiers of Computer Science and Technology 11(12) (2017), 1922–1930.

17.

and Garcia

E.A.

, Learning from imbalanced data, IEEE Transactions on Knowledge and Data Engineering 21(9) (200), 1263–1284.

18.

Vapnik

V.N.

The nature of statistical learning theory, Berlin, Heidelberg: Springer, 1995.

19.

Khemchandani

and Chandra

, Twin support vector machines for pattern classification, IEEE Transactions on Pattern Analysis and Machine Intelligence 29(5) (2007), 905–910.

20.

Q.Y.

, Wu

H.R.

, Zhou

X.M.

et al., Online transfer learning with multiple homogeneous or heterogeneous sources, IEEE Transactions on Knowledge and Data Engineering 29(7) (2017), 1494–1507.

21.

Kang

Z.F.

, Yang

S.T.

et al., Online transfer learning with multiple source domains for multi-class classification, Knowledge-Based Systems 190 (2020), 105149.

22.

Zhou

J.Y.

and Wang

S.T.

, Multi-source online transfer lear ning for imbalanced target domain, CAAI Transactions on Intelligent Systems 17(2) (2022), 248–256.

23.

Yang

F.F.

and Zhang

P.F.

, MSIF: Multi-source information fusion based on information sets, Journal of Intelligent & Fuzzy Systems 44(3) (2023), 4103–4112. doi: 10.3233/JIFS-222210.

24.

Zhang

P.F.

, Li

T.R.

et al., A Possibilistic Information Fusion-Based Unsupervised Feature Selection Method Using Information Quality Measures, IEEE Transactions on Fuzzy Systems, (Early Access) 30 January 2023, 1–14. DOI: 10.1109/TFUZZ.2023.3238803.

25.

Hua

and Jing

X.C.

, An improved belief Hellinger divergence for Dempster-Shafer theory and its application in multi-source information fusion, Applied Intelligence, (Early Access) 2023. DOI: 10.1007/s10489-022-04428-w.

26.

Crammer

, Dekel

, Keshet

et al., Online passive aggressive algorithms, Journal of Machine Learning Research 7 (2006), 551–585.

27.

Chawla

N.V.

, Bowyer

K.W.

, Hall

L.O.

et al., SMOTE: synthetic minority over-sampling technique, Journal of Artificial Intelligence Research 16 (2002), 321–357.

28.

Mathew

, Pang

C.K.

, Luo

et al., Classification of imbalanced data by oversampling in kernel space of support vector machines, IEEE Transactions on Neural Networks & Learning Systems 29(9) (2018), 4065–4076.

29.

Venkateswara

, Eusebio

, Chakraborty

et al., Deep hashing network for unsupervised domain adaptation, Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21–26, Washington: IEEE Computer Society, 2017:5385–5394.

30.

Ringwald

and Stiefelhagen

, Adaptiope: a modern benchmark for unsupervised domain adaptation, Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision, Waikoloa, Jan 3–8, Piscataway: IEEE, 2021:101–110.

Enhancing classification performance through multi-source online transfer learning algorithm with oversampling

Abstract

Keywords

1 Introduction

2 Materials and methods

2.1 Multi-source online transfer learning

2.3 Oversampling in the feature space of the source domain

3 Experiment and performance analysis

3.1 Dataset introduction

3.2 Baseline algorithm and evaluation indicators

Table 1 Two-classification confusion matrix Truth Forecast results Positive example Counterexample Positive example TP FN Counterexample FP TN

3.3.1 Parameter settings

3.3.2 Experimental results on the 20Newsgroups dataset

Footnotes

Acknowledgments

References

Table 1
Two-classification confusion matrix

Truth Forecast results

Positive example Counterexample

Positive example TP FN

Counterexample FP TN