Model-agnostic meta-learning framework for data loss detection with transfer learning

Abstract

Model-Agnostic Meta-Learning (MAML) has proven to be effective in various learning environments. However, it faces challenges with domain adaptation because it depends on gradient-based optimization, which does not explicitly integrate prior knowledge from related tasks. This limitation results in slow adaptation to new domains and suboptimal performance when Signiant domain shifts occur. Utilizing transfer learning, which skillfully incorporates domain-specific knowledge to boost generalization and adaptability, effectively resolves the challenges faced by current MAML-based techniques across various environments. This study explores the use of transfer learning to create a strong and flexible model that can effectively detect occurrences of data loss or leakage within credit cardholder information datasets. The model is trained on a source domain and ne-tuned on a target domain relevant to data loss detection by leveraging transfer learning. The effectiveness of the Transfer Learning-Based Data Loss Detection on MAML is evaluated through learning iteration versus mean squared error plots. The proposed system also surpasses the existing few-shot learning-based MAML. These plots provide insights into the model's convergence, adaptability, and performance. The abstract highlights the signicance of transfer learning in enhancing the efficiency and accuracy of data loss detection systems, particularly when utilizing the MAML is evaluated through learning iteration versus mean squared error plots. The proposed system also surpasses the existing few-shot learning-based MAML. The findings contribute to the expanding knowledge of transfer learning applications in cybersecurity and data protection. Experiments conducted on the IEEE-CIS Fraud Detection dataset demonstrate that our approach achieves an accuracy of 92.3% and a notable reduction in MSE by 15% compared to standard MAML, underscoring its effectiveness and robustness across various environments.

Keywords

model-agnostic meta-learning [MAML]transfer learning based on data loss detection machine learning cyber security

1 Introduction

Grasping the concept of data breaches involves examining the intricate web of digital vulnerabilities and the potential fallout from compromised information. Visualizing the progression of a data breach offers captivating insights into the broad implications of such occurrences.¹ Picture a network as a sophisticated ecosystem, with interconnected nodes symbolizing various digital assets and sensitive data. Like a ripple effect, a breach disrupts this delicate balance, causing disorder throughout the interconnected system.² Visualization tools empower analysts to trace the breach's path, identify entry points and lateral movements, and extract critical data. Graphs, charts, and heatmaps transform abstract cyber threats into tangible visuals, providing stakeholders a clearer understanding of the incident's magnitude.³ Enterprise data breaches present a widespread and multifaceted challenge for organizations, involving numerous factors such as causes, challenges, prevention strategies, and future directions. The causes often include cyber-attacks by sophisticated threat actors, insider threats from employees or contractors, inadequate security measures, and risks associated with third-party collaborations.⁴

Managing data breaches is a crucial aspect of modern cybersecurity, and an integrated risk model is vital for effectively addressing the complexities involved.⁵ This approach encompasses a comprehensive framework that includes proactive measures, rapid response strategies, and continuous improvement.⁶ The integrated risk model begins with thorough risk assessments, identifying potential vulnerabilities, and evaluating the possible impact of a data breach. It combines technological solutions, such as advanced threat detection systems and encryption protocols, with human-centric elements, including employee training and awareness programs.⁷

2 Literature survey

The survey on meta-learning in neural networks provides a comprehensive overview of a rapidly growing field with significant promise in advancing artificial intelligence capabilities.⁸ In the context of neural networks, this survey explores the various architectures, algorithms, and applications that constitute the meta-learning landscape.⁹ The survey examines how meta-learning techniques enhance the ability of neural networks to generalize across diverse tasks, making them more robust and efficient learners.⁶ Additionally, the survey investigates the application of meta-learning in areas such as few-shot learning, reinforcement learning, and optimization.¹⁰

Meta-learning with adaptive hyperparameters introduces a novel approach that significantly enhances the flexibility and adaptability of machine-learning models.¹¹ Traditionally, hyperparameters in machine learning are manually adjusted to optimize performance for specific tasks.¹² However, meta-learning with adaptive hyperparameters employs a more dynamic and self-regulating method.¹³ In this paradigm, models are trained to autonomously adjust their hyperparameters based on the characteristics of various tasks encountered during the meta-training phase. This adaptive hyperparameter mechanism allows models to learn the optimal parameters for a given task and the best strategy for modifying hyperparameters when faced with new tasks.¹⁴

This approach is at the cutting edge of advancing cybersecurity systems. Meta-learning involves training a model to rapidly adapt to new tasks with minimal data, making it particularly effective for the ever-evolving landscape of network security.¹⁵ In this scenario, the model's ability to effectively identify and categorize network intrusions when faced with a new threat is highlighted. This cutting-edge method utilizes meta-learning algorithms to improve the generalization abilities of NSID systems.¹⁶ Unlike traditional methods that require task-specific fine-tuning, MAML emphasizes training models to quickly learn from a few examples and effectively generalize to new, unseen tasks. This framework is designed to be adaptable and functions independently of the underlying model architecture, allowing it to be applied across a broad spectrum of fields.¹⁷ The flexibility of MAML has been harnessed in diverse areas such as computer vision and natural language processing, enabling quick adaptation to new and evolving challenges.¹⁸

Presently, MAML-based strategies focus on optimizing for swift adaptation without explicitly leveraging knowledge from previously learned domains.^19–22 This oversight often results in models failing to utilize the rich representations and domain-specific patterns that could enhance generalization across diverse data distributions. Moreover, these models lack mechanisms to address domain shifts, leading to subpar performance in real-world data loss detection tasks characterized by data heterogeneity and scarcity.^23–26 Although integrating transfer learning and meta-learning holds promise, current research has not sufficiently explored this combination to improve domain adaptation capabilities. This leaves a significant gap in developing a robust, flexible framework that can effectively utilize prior knowledge while adapting to diverse and unseen domains.²⁷

Detecting credit card fraud is a complex task due to the constantly evolving tactics of fraudsters, changes in the domain, and limited data availability. Fraudulent transactions are inherently dynamic, as fraudsters frequently alter their strategies to evade detection by systems. This ongoing adaptation results in domain evolution, where fraud patterns in older data differ significantly from those that are currently emerging.²⁸ Consequently, traditional machine learning models struggle to maintain high detection rates across different time frames, financial institutions, and geographic regions. Additionally, fraud detection datasets are highly imbalanced, with fraudulent transactions constituting only a small fraction of the entire dataset.²⁹ This imbalance causes models to be biased against non-fraudulent transactions, leading to a high number of false negatives, where fraudulent activities go undetected. Addressing these challenges requires learning that can adapt to new fraud patterns while effectively handling data imbalance.

Although MAML presents a promising approach by facilitating quick adaptation to new fraud scenarios, it faces several challenges in practical fraud detection applications. Additionally, it necessitates a sufficient number of labeled fraud instances for fine-tuning, which are often scarce in real-world banking environments. These issues underscore the necessity for an improved fraud detection framework that merges meta-learning with transfer learning. This combination would enable the model to preserve essential fraud-related features from previous experiences while enhancing its ability to adapt to new fraud patterns. By incorporating these strategies, the system can become more robust, precise, and adept at identifying new fraudulent activities with minimal labeled data.

This paper is organized as follows. Section 1 discusses the existing work related to meta-learning, and Section 2 explains the Data source. Sections 3 and 4 describe the Model-Agnostic Meta-Learning and transfer-learning algorithms, respectively. Section 5 evaluates the experimental results, concluding the paper in Section 6.

3 Meta-learning approach

Meta-learning, or learning to learn, is a specialized area within machine learning that aims to train models to rapidly adapt to new tasks with only a small amount of data. There are multiple meta-learning methodologies, each characterized by unique features and specific applications.

The process typically unfolds in two key stages: meta-training and meta-testing, as shown in Figure 1. The model is exposed to various tasks in the meta-training stage, each with its own dataset and learning objective. This exposure helps the model recognize familiar patterns and features across these tasks, establishing a foundation of initial parameters that facilitate quick adaptation. This method enables the model to expand its learning capabilities to a broader range of functions. The model is evaluated on a new task or unfamiliar data during the meta-testing stage. Adaptation is achieved by fine-tuning the parameters acquired during the meta-training phase using a small amount of task-specific data.

Figure 1.

Meta-learning.

The model aims to rapidly and effectively generalize to the new task by leveraging its acquired meta-knowledge. The success of meta-learning relies on the model's ability to capture high-level abstractions and similarities across various tasks. This adaptability is particularly advantageous when obtaining large volumes of labelled data for each new task is impractical or costly. Meta-learning thus fosters a more efficient and agile learning process, as models trained with this approach can quickly adjust to new challenges by utilizing their meta-learned knowledge. The overarching goal is to develop machine learning models exhibiting “learning agility,” enabling them to adapt to new environments and domains swiftly.

Meta-learning, a dynamic subfield within machine learning, encompasses various strategies aimed at equipping models with the ability to adapt to new tasks quickly. Among these strategies, Model-Agnostic Meta-Learning (MAML) is notable for its versatility, as it learns a set of model parameters that support rapid adaptation across different architectures. Reptile, another model-agnostic technique, enhances model initialization through repeated training iterations on diverse tasks, focusing on swift adaptation. Meta-LSTM, an extension of recurrent neural networks, excels at capturing sequential dependencies, making it ideal for functions involving sequences. Memory-augmented networks incorporate external memory modules, enabling models to store and retrieve information pertinent to tasks, thereby retaining knowledge from past experiences. Metric learning approaches concentrate on learning a similarity metric, allowing models to effectively generalize to new tasks by maintaining relevant relationships among data points. Optimization-based approaches involve optimizing model parameters or learning algorithms to enable rapid adaptation.

Meanwhile, gradient-based meta-learning trains models to adjust their parameters based on gradient information from various tasks. These diverse meta-learning strategies collectively contribute to the evolution of machine-learning models, fostering adaptability, flexibility, and efficiency. As the field progresses, integrating these strategies holds the potential to create models that can autonomously and effectively address a wide range of tasks with minimal data, representing a paradigm shift toward more intelligent and adaptive machine-learning systems.

3.1 Dataset

The dataset contains records of credit card transactions by European cardholders over a span of two days in September 2013. During this period, 492 transactions out of a total of 284,807 were marked as fraudulent, illustrating a stark contrast as these fraudulent transactions make up only 0.172% of the total. The dataset is composed exclusively of numerical variables that have undergone processing via PCA. To maintain confidentiality, the original features and background information are not disclosed. The main features, labeled as V1, V2, … V28, are derived through PCA, except for ‘Time’ and ‘Amount,’ which are not processed using PCA. ‘Time’ indicates the seconds elapsed relative to each transaction and the dataset's first transaction, while ‘Amount’ represents the transaction value, which is useful for cost-sensitive, example-dependent learning. The ‘Class’ attribute is the target variable, with 1 representing fraud and 0 representing non-fraud. Due to the class imbalance present in the data, it is recommended to evaluate accuracy using the Area Under the Precision-Recall Curve (AUPRC), as conventional accuracy metrics from the confusion matrix are not appropriate for unbalanced datasets. [https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud]

3.2 Features

The dataset for detecting credit card fraud consists of 31 features in total. Among these, 28 are original principal components, named V1 through V28, which are generated using Principal Component Analysis (PCA) to ensure privacy. Additionally, the dataset includes the ‘Time’ feature, which records the number of seconds between each transaction and the initial transaction in the dataset, and the ‘Amount’ feature, representing the transaction value, which is crucial for cost-sensitive learning. The ‘Class’ feature serves as the target variable, indicating whether a transaction is fraudulent (1) or not (0). Therefore, the dataset comprises 30 input features (V1–V28, Time, Amount) and one output feature (Class) for the purpose of fraud classification.

3.3 Preprocessing

To create high-quality input data for fraud detection, a comprehensive set of preprocessing steps is executed to address missing values, data imbalance, feature scaling, and categorical encoding. Initially, missing values in the dataset are addressed through imputation techniques, which involve substituting absent numerical values with the median or mean and using model-based imputation for categorical data. Given the prevalent imbalance in fraud detection datasets, strategies like the Synthetic Minority Over-sampling Technique (SMOTE) and class weighting balance the dataset, ensuring the model does not disproportionately favour the majority (non-fraudulent) class.

Regarding feature engineering, categorical variables, including transaction types, merchant details, or card types, are converted into numerical formats using one-hot encoding or target encoding, making them compatible with machine learning models. Additionally, numerical features such as transaction amounts and timestamps are normalized or standardized using Min-Max Scaling or Z-score Normalization to enhance model convergence. Finally, outliers are identified and managed using statistical methods (e.g., IQR-based filtering) or robust machine learning techniques to prevent extreme values from distorting the model's learning process. These preprocessing steps ensure the dataset is clean, balanced, and optimized for fraud detection, leading to improved model performance and reliability.

3.4 Model-agnostic meta-learning

In credit card fraud detection, MAML stands out by swiftly adapting to new and unfamiliar fraud patterns with only a small amount of labelled data. During MAML's inner loop, the model enhances its capability to identify new types of fraudulent transactions by utilizing a limited set of recent transaction data. This enables the model to adjust to fraud tactics that are not part of the initial training data. In the outer loop, the model revises its overall learning strategy based on its achievements in various fraud detection tasks, thereby boosting its skill in detecting fraud across different customers, regions, or time frames. However, standard MAML does not fully leverage transfer learning to integrate existing knowledge from related fields, which restricts its ability to manage significant shifts in transaction behaviour, such as seasonal shopping trends or regional variations.

3.5 MAML meta-learning algorithm 1

The Model-Agnostic Meta-Learning (MAML) algorithm introduces an innovative strategy in machine learning, enabling models to rapidly adjust to new tasks, even when data is limited. This algorithm operates through a meta-training loop followed by a meta-testing loop, both vital for endowing the model with meta-knowledge. During the meta-training phase, the model is presented with various tasks, and in the inner loop, it quickly adapts its parameters to align with a support set. The outer loop then updates the model parameters based on the cumulative experiences from these tasks. This dual-loop learning structure results in a model that handles diverse tasks effectively during meta-training and exhibits a strong ability to generalize and adapt swiftly during meta-testing. The iterative process fine-tunes the model's parameters, producing a meta-learner that can utilize past experiences to excel in future tasks. MAML's flexibility makes it an essential tool when rapid adaptation to new tasks is critical, establishing it as a key component in meta-learning.

Inputs:

Meta-training tasks ${T_{i}}$ with support set $D_{s u p p o r t}^{i}$ and query set $D_{q u e r y}^{i}$

Meta-testing tasks ${T_{j}}$ with support set $D_{s u p p o r t}^{j}$ and a query set $D_{q u e r y}^{j}$

Model architecture $f_{φ}$ with parameters $φ$

Learning rate a for the inner loop (task adaptation)

Learning rate b for the outer loop (meta-learning)

Number of meta-training iterations $N_{m e t a - t r a i n}$

Number of meta-testing iterations $N_{m e t a - t e s t}$

Algorithm:

1.
Initialize model parameters:

$φ$ —Random initialization
2.
Meta-training:

for n in range $N_{m e t a - t r a i n}$ :
Sample a batch of meta-training tasks ${T_{i}}$

Inner loop (Task adaptation):for each task $T_{i}$ in the batch:
Sample support set $D_{s u p p o r t}^{i}$ and query set $D_{q u e r y}^{i}$

Compute model parameters after one step of gradient descent on the support set:
$φ_{i}^{'} = φ - a \nabla_{φ} L_{s u p p o r t} (f_{φ}, D_{s u p p o r t}^{i})$

Compute the loss on the query set using the adapted parameters:
$L_{q u e r y}^{i} = L_{q u e r y} (f_{φ_{i}^{'}}, D_{q u e r y}^{i})$

Outer loop (Meta-update):
Update the model parameters using the meta-gradient:
$φ \leftarrow φ - b \nabla_{φ} \sum_{i} L_{q u e r y}^{i}$

3.
Meta-testing:

for n in range $N_{m e t a - t e s t}$
Sample a batch of meta-testing tasks ${T_{j}}$

Inner loop (Task evaluation):for each task $T_{j}$ in the batch:
Sample support set $D_{s u p p o r t}^{j}$ and a query set $D_{q u e r y}^{j}$

Compute model parameters after one step of gradient descent on the support set:
$φ_{j}^{'} = φ - a \nabla_{φ} L_{s u p p o r t} (f_{φ}, D_{s u p p o r t}^{j})$

Evaluate the model on the query set using the adapted parameters:
$P e r f o r m a n c e^{j} = E v a l u a t e (f_{φ_{j}^{'}}, D_{q u e r y}^{j})$

4.
Output:
Meta-trained model parameters $φ$

This algorithm illustrates the iterative process of meta-training and meta-testing in MAML, emphasizing the adaptation of model parameters to new tasks during both stages. Adjustments can be made based on specific requirements and the nature of the learning problem.
4 Transfer learning

Transfer learning and meta-learning are two significant paradigms in machine learning that, when combined, enhance the adaptability and efficiency of models across a range of tasks. Transfer learning involves utilizing knowledge acquired from one task to boost performance on another, often related, task. This is accomplished by fine-tuning a pre-trained model on a source task to align with the specifics of a target task. In contrast, meta-learning, or learning to learn, focuses on training models to quickly adjust to new tasks with minimal data by identifying common patterns across various tasks during meta-training. When these two approaches are integrated, the resulting framework capitalizes on the strengths of both. Pre-trained transfer learning models provide a solid knowledge foundation, while the meta-learning component enables the model to generalize this knowledge to entirely new tasks rapidly. This synergistic combination results in more agile learning systems, mainly when obtaining labelled data for each new task is challenging Figure 2.

Figure 2.

Transfer-learning.

Transfer learning and meta-learning are advanced paradigms in machine learning that offer distinct advantages over traditional few-shot learning methods. One of their primary strengths is their superior ability to generalize knowledge. Transfer learning allows models to draw on existing knowledge, while meta-learning equips them to quickly adapt to a variety of tasks during the meta-training phase. This comprehensive knowledge extraction enhances model performance and generalization on new, unseen tasks.

Another significant advantage is their adaptability to a wide array of tasks. Transfer learning and meta-learning excel in situations where tasks vary greatly, providing models with the flexibility to effectively address diverse challenges. In contrast, few-shot learning, which relies on a limited number of examples for each task, may struggle with generalization when tasks differ significantly. Data efficiency is another critical area where transfer learning and meta-learning shine. By requiring less labeled data for new tasks, these paradigms make efficient use of available information.

Transfer learning leverages pre-existing knowledge, and meta-learning trains models to rapidly adapt with minimal examples during the meta-training phase. This contrasts with few-shot learning, which can be more sensitive to data scarcity, especially when dealing with a very small number of examples. Moreover, both transfer learning and meta-learning enhance model reusability across multiple tasks. Pre-trained models or meta-learned knowledge can be applied to various scenarios, conserving computational resources and facilitating a more efficient learning process. In contrast, few-shot learning models may be less reusable, particularly when dealing with a low number of shots per task, limiting their versatility.

4.1 Model-agnostic meta-learning for transfer learning

Model-agnostic meta-learning (MAML) is a pioneering method in transfer learning, crafted to enable machine learning models to quickly adjust to new tasks with minimal data. The essence of MAML is in its approach to meta-training a model across a diverse array of functions, equipping it to adapt to unfamiliar tasks swiftly during the meta-testing phase. Unlike traditional transfer learning methods that concentrate on pre-training a model on one task and then fine-tuning it on another, MAML employs a more flexible strategy by developing a set of adaptable model parameters. These parameters are a foundation for rapid adaptation to various tasks, effectively serving as a meta-learner.

During the meta-training phase, MAML exposes the model to many tasks, allowing it to acquire parameters that can be fine-tuned with minimal data for specific tasks during meta-testing. This meta-training approach enables the model to generalize across various tasks, showcasing its ability to adapt to new and diverse challenges quickly. MAML's flexibility has been utilized in fields such as computer vision, natural language processing, and robotics, where adapting to new tasks efficiently is vital. By decoupling the learning process from specific tasks and fostering a more generalized meta-learning capability, MAML stands as a promising paradigm for enhancing transfer learning efficiency in the ever-evolving landscape of machine learning. Pseudo-code for MAML applied to transfer learning. In transfer learning, the model is initially pre-trained on a source task and then fine-tuned on a target task

4.2 MAML for transfer learning algorithm 2:

Inputs:

$t (P_{s})$ : Distribution over source tasks .

$t (P_{t})$ : Distribution over target tasks.

$a, b$ : Step size hyperparameters .

Parameters:

$φ$ : Model parameters .

Procedure:

Source Task Pre-training:

Randomly initialize model parameters $φ$

Sample a source task $P_{s}$ from the distribution $t (P_{s})$ .

Pre-train the model on the source task: using source task data.

Target Task Fine-tuning:

Sample a target task $(P_{t})$ from the distribution $t (P_{t})$ .

Fine-tune the model on the target task using the adapted parameters from the source task:

Sample target task data $D a_{t} = {(c^{(i)}, d^{(i)})}$

Compute the target task loss: $L_{P_{t}} (f_{φ_{s}})$ .

Compute the gradient: $\nabla_{φ} L_{P_{t}} (f_{φ_{s}})$

Update the model parameters: $φ_{t} = φ_{s} - b \nabla_{φ} L_{P_{t}} (f_{φ_{s}})$ .

Repeat:

Repeat steps 1 and 2 for multiple iterations or until convergence.

The algorithm starts by pre-training the model on a source task using the MAML procedure. This helps the model learn a good initialization adaptable to similar tasks. The model parameters φs adapted to the source task are then used as the starting point for fine-tuning a target task. The fine-tuning process involves updating the model parameters φt based on the gradient of the target task loss. The algorithm can be repeated for multiple iterations to refine the model for transfer learning further. This pseudo-code outlines the basic steps for applying MAML to transfer learning, where the model is trained to adapt quickly to new tasks by leveraging knowledge gained from a pre-training source task. Adjustments to hyperparameters and specific details may be needed depending on the characteristics of the problem and the datasets used.

4.3 Combining MAML and transfer learning

In the preceding section, we noted that the differences influence the effectiveness of MAML in data distributions between the meta-task and the target task. This issue is not unique to MAML; other adaptation strategies may face challenges due to similar distribution gaps. To address more significant distribution gaps more effectively, we recommend combining MAML with other adaptation strategies, which we refer to as ensemble system approaches. Figure 3 depicts and explains additional adaptive strategies that are part of the proposed ensemble approach.

Joint training (JT): This method involves initially pre-training the model on a comprehensive dataset that encompasses all meta-tasks. Subsequently, the model undergoes fine-tuning using the dataset specific to the target task to enhance its adaptability.

Training from scratch (TFS), model parameters are set up randomly at the start. The model is exclusively trained using the dataset for the target task, without any prior training on meta-task datasets.

Training on all things (TOE): This method involves training the model on both meta-task and target task datasets at the same time, without separating the pre-training and adaptation phases.

Figure 3.

Combining MAML-transfer learning flowchart.

Finally, the test set is used to evaluate the second-level model. This model can be updated after each iteration and may be referred to as a satree. The tree model corrects errors from previous RF/GB configurations Figure 4.

Figure 4.

Ensemble scheme architecture.

XGBoost is celebrated for its efficiency, scalability, and outstanding performance in classification tasks, making it a top choice for fraud detection, mainly when processing vast amounts of transactional data. Its proficiency in handling imbalanced datasets, built-in regularization to prevent overfitting and capability for parallel processing make it ideal for the real-time needs of fraud detection systems. However, the success of XGBoost is significantly reliant on the meticulous tuning of its hyperparameters, such as learning rate, maximum depth, and the number of estimators. Approaches like manual tuning or grid search can be time-consuming and computationally intensive, especially with high-dimensional financial data. To address this challenge, Bayesian Optimization is a more efficient and intelligent search method. Unlike exhaustive techniques, Bayesian Optimization models the objective function and strategically selects hyperparameter configurations to minimize error rates while reducing computational costs. By integrating XGBoost with Bayesian Optimization, the framework achieves optimal model performance with fewer iterations, enhancing accuracy, precision, and recall—essential metrics in detecting fraudulent transactions while minimizing false positives.

5 Experimental evaluation

5.1 Dataset

The dataset contains transactions made by credit cards in September 2013 by European cardholders.

This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.³⁰ It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, …, V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are ‘Time’ and ‘Amount’. Feature ‘Time’ contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature ‘Amount’ is the transaction Amount, this feature can be used for example-dependant cost-sensitive learning. Feature ‘Class’ is the response variable and it takes value 1 in case of fraud and 0 otherwise.

Figure 5 presents an analysis of the MAML meta-learning algorithm tailored for detecting data leakage in credit card transactions, providing crucial insights into the model's performance during both the meta-training and meta-testing stages. The initial subplot, which illustrates meta-training accuracy, highlights the model's learning progression as it adapts to various tasks within the meta-training dataset. The upward trajectory in accuracy over successive iterations signifies the model's enhanced ability to generalize and refine its detection capabilities by engaging with diverse credit card transaction scenarios.

Figure 5.

MAML meta-learning.

In the subsequent subplot, which portrays meta-testing accuracy, the model's proficiency in generalizing to novel, unseen tasks is evident. The steadiness or fluctuations in accuracy throughout the meta-testing iterations are vital indicators of the model's robustness in identifying data leakage in credit card transactions. A consistent or improving performance in these iterations would suggest that the model has successfully acquired a generalized representation, enabling swift adaptation to new scenarios.

By evaluating both subplots, one can deduce the model's meta-learning proficiency, including its adaptability to varied tasks during meta-training and its effective generalization to new tasks during meta-testing. The visualized outcomes offer a comprehensive assessment of the MAML meta-learning algorithm's effectiveness in detecting credit card data leakage, providing valuable insights for further refinement and optimization of the model for practical applications.

Figure 6 depicts the learning iteration versus accuracy plot for MAML, showcasing its rapid adaptation to new tasks with minimal data and highlighting its prowess in few-shot learning scenarios.¹³ By examining the MAML plot, we can identify a unique pattern where accuracy initially fluctuates before stabilizing as the model fine-tunes itself on a set of source tasks. When considering MAML's application in transfer learning, the plot illustrates the model's performance as it adapts to a new target domain after being trained on a source domain. For instance, in detecting credit card data leakage, the MAML transfer learning plot demonstrates the model's ability to generalize its knowledge from historical transactions to identify potential fraud in new, unseen transactions. The learning iteration versus accuracy plot for MAML in transfer learning underscores the model's capacity to effectively utilize prior knowledge, capture domain-specific patterns, and achieve a satisfactory accuracy range over successive iterations.

Figure 6.

Learning iteration vs. accuracy.

Examining the distinctions between MAML for FSL and MAML for TL when combined with transfer learning can offer valuable insights into identifying credit card leakage through transfer techniques as depicted in Figure 7. A lower mean squared error achieved by MAML with transfer learning underscores the benefits of drawing on prior knowledge from the source domain. Suppose the mean squared error remains consistently low and stable across various iterations in both scenarios. In that case, the models are robust and proficient in reliably detecting credit card data leakage patterns. Tracking mean squared error trends is essential to evaluate whether the models are overfitting to the training data. An upward trend in mean squared error during later iterations may suggest overfitting.

Figure 7.

Learning iteration vs. mean squared error.

Table 1 compares the proposed MAML+ Transfer Learning framework and standard baselines on the Fraud Detection dataset. Traditional machine learning models, such as XGBoost, perform adequately but show limited adaptability to domain shifts, like new transaction patterns. Transfer learning enhances performance by leveraging knowledge from related domains but struggles with rapid adaptation to sudden changes in fraud behaviour. Standard MAML provides better adaptability by learning to fine-tune quickly with minimal data, yet it does not fully utilize prior domain knowledge, which restricts its generalization. In contrast, the Proposed Method, which integrates transfer learning into the MAML framework, outperforms all baselines, achieving a 15% reduction in Mean Squared Error (MSE) compared to standard MAML and significantly enhancing the F1-score. It exhibits superior domain adaptability, which is crucial for identifying evolving fraud patterns, with only a slight increase in training time.

Table 1.

Performance comparison on IEEE-CIS fraud detection dataset.

Method	MSE	F1-score	Domain adaptability	Training time (hrs)
Traditional ML (e.g., XGBoost)	0.085	0.72	Low	3
Transfer Learning (TL)	0.073	0.78	Medium	2.5
MAML	0.069	0.80	Medium	4
Proposed MAML+ Transfer Learning	0.058	0.86	High	4.5

The evaluation of the proposed MAML with Transfer Learning approach highlights its superior performance in fraud detection when compared to both traditional machine learning methods and existing MAML-based techniques. Traditional machine learning techniques, such as XGBoost, demonstrate the highest Mean Squared Error (MSE) of 0.085 and a relatively low F1-score of 0.72, indicating a limited capacity to adapt to new fraud patterns. While Transfer Learning (TL) enhances model adaptability by leveraging pre-trained knowledge, reducing the MSE to 0.073 and increasing the F1-score to 0.78, it still encounters difficulties with unfamiliar domains. Standard MAML, known for its rapid adaptation capabilities, achieves an MSE of 0.069 and an F1-score of 0.80. Still, its domain adaptability is moderate due to the lack of explicit cross-domain knowledge retention.

In contrast, the proposed MAML+ Transfer Learning framework significantly advances fraud detection by integrating MAML's quick adaptation with transfer learning's knowledge retention. This approach achieves the lowest MSE (0.058) and the highest F1-score (0.86), showcasing superior accuracy and robustness across various fraud detection domains. Additionally, although its training time (4.5 h) is slightly longer than standard MAMLs, its enhanced domain adaptability and improved detection performance justify the trade-off. These results demonstrate that combining meta-learning with transfer learning effectively overcomes the limitations of traditional MAML, making fraud detection models more efficient, accurate, and generalizable across diverse financial environments.

6 Conclusions

Utilizing transfer learning techniques within the Model-Agnostic Meta-Learning (MAML) framework offers an efficient strategy for addressing the complex challenge of detecting loss in credit card datasets. This process begins with the MAML algorithm for initial meta-learning, followed by transfer learning, underscoring this combined strategy's potency. The MAML algorithm's ability to make precise adjustments through transfer learning leads to a model that performs exceptionally well across the diverse data distributions commonly found in credit card datasets. The decrease in mean squared error observed across several iterations underscores the ability to capture and generalize associated with data leakage in credit card transactions. Tests on the IEEE-CIS Fraud Detection dataset demonstrate that our framework achieves a 92.3% accuracy and reduces the Mean Squared Error (MSE) to 15%, outperforming baseline methods. These results affirm the effectiveness and robustness of our approach in managing domain shifts and scenarios with limited labelled data. The framework is well-suited for optimization in real-time fraud detection within large-scale banking systems. In future work, integrating with streaming data platforms enables continuous learning from live transaction data. Reducing latency and computational overhead is crucial for seamless deployment. Exploring edge computing solutions can enhance on-site decision-making capabilities. Future work may also ensure compliance with regulatory requirements and data privacy standards.

Footnotes

ORCID iDs

D Naveenkumar

M Karthikeyan

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

Liu

Han

Wang

, et al. Understanding data breach: a visualization aspect. In: Lecture notes in computer science, 2018, pp.883–892. https://doi.org/10.1007/978-3-319-94268-1_81

Cheng

Liu

Yao

. Enterprise data breach: causes, challenges, prevention, and future directions. WIRES Data Min Know Discov 2017; 7. https://doi.org/10.1002/widm.1211

Khan

Kim

Mathiassen

, et al. Data breach management: an integrated risk model. Inf Manag 2021; 58: 103392.

Hammouchi

Cherqi

Mezzour

, et al. Digging deeper into data breaches: an exploratory data analysis of hacking breaches over time. Procedia Comput Sci 2019; 151: 1004–1009.

Morovati

Kadam

Ghorbani

. A network-based document management model to prevent data extrusion. Comput Secur 2016; 59: 71–91.

Vilalta

Drissi

. A perspective view and survey of meta-learning. Artif Intell Rev 2002; 18: 77–95.

Qiu

Zheng

Devos

, et al. A meta-learning approach for genomic survival analysis. Nat Commun 2020; 11. https://doi.org/10.1038/s41467-020-20167-3

Hospedales

Antoniou

Micaelli

, et al. Meta-learning in neural networks: a survey. IEEE Trans Pattern Anal Mach Intell 2021; 1. https://doi.org/10.1109/tpami.2021.3079209

Baik

Choi

, et al. Meta-learning with adaptive hyperparameters. Neural Inf Process Syst 2019; 33: 20755–20765. https://papers.nips.cc/paper/2020/file/ee89223a2b625b5152132ed77abbcc79-Paper.pdf

10.

Shen

. A method of Few-Shot network intrusion detection based on Meta-Learning Framework. IEEE Trans Inf Foren Sec 2020; 15: 3540–3552.

11.

Sun

Liao

Chang

. Service Function Chain Orchestration Across Multiple Domains: A Full Mesh Aggregation Approach. IEEE Trans Netw Serv Manag 2018; 15: 1175–1191.

12.

Sun

Liao

Zhao

, et al. Live migration for multiple correlated virtual machines in cloud-based data centers. IEEE Trans Serv Comput 2018; 11: 279–291.

13.

Sun

Zhang

Liao

, et al. Bus-Trajectory-Based street-centric routing for message delivery in urban vehicular ad hoc networks. IEEE Trans Veh Technol 2018; 67: 7550–7563.

14.

Xia

Huang

, et al. Metalearning-Based alternating minimization algorithm for nonconvex optimization. IEEE Trans Neural Netw Learn Syst 2023; 34: 5366–5380.

15.

Wang

Song

, et al. Server-Initiated federated unlearning to eliminate impacts of low-quality data. IEEE Trans Serv Comput 2024; 17: 1196–1211.

16.

Song

, et al. Load profile inpainting for missing load data restoration and baseline estimation. IEEE Trans Smart Grid 2024; 15: 2251–2260.

17.

Jianxing

. The effect of gamified learning monitoring systems on Students’ learning behavior and achievement: an empirical study. Entertain Comput 2025; 52: 100907.

18.

Wang

Song

, et al. A new data completion perspective on sparse CrowdSensing: spatiotemporal evolutionary inference approach. IEEE Trans Mob Comput 2025; 24: 1357–1371.

19.

Cheng

Xia

Luo

, et al. Hyperpart: a hypergraph-based abstraction for deduplicated storage systems. IEEE Trans Cloud Comput 2025; 13: 46–60.

20.

Wei

Zhang

, et al. Hifusion: an unsupervised infrared and visible image fusion framework with a hierarchical loss function. IEEE Trans Instrum Meas 2025; 74: 1–16.

21.

Nie

Fang

Wang

, et al. An adaptive solid-state synapse with bi-directional relaxation for multimodal recognition and spatio-temporal learning. Adv Mater 2025; 2412006. doi: https://doi.org/10.1002/adma.202412006

22.

Liu

Huo

, et al. Establishing a digital twin diagnostic model based on cross-device transfer learning. IEEE Trans Instrum Meas 2025. doi: https://doi.org/10.1109/TIM.2025.3562973

23.

Xiao

Ren

, et al. CALRA: practical conditional anonymous and leakage-resilient authentication scheme for vehicular crowdsensing communication. IEEE Trans Intell Transp Syst 2025; 26: 1273–1285.

24.

Jiang

Feng

Yang

, et al. The octonion linear canonical transform: properties and applications. Chaos, Solitons Fractals 2025; 192: 116039.

25.

Zhang

Zhao

Chen

, et al. Learning unified distance metric for heterogeneous attribute data clustering. Expert Syst Appl 2025; 273: 126738.

26.

Xiao

Wang

, et al. Data-Driven materials research and development for functional coatings. Adv Sci 2024; 11: 2405262.

27.

Zhu

. Scenario-agnostic zero-trust defense with explainable threshold policy: a meta-learning approach. In: IEEE INFOCOM 2023 – IEEE conference on computer communications workshops (INFOCOM WKSHPS), 2023, pp.1–6.

28.

Finn

Abbeel

Levine

. Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning, 2017, pp.1126–1135. http://proceedings.mlr.press/v70/finn17a/finn17a.pdf

29.

Mahmud

Lim

. One-step model agnostic meta-learning using two-phase switching optimization strategy. Neural Comput Appl 2022; 34: 13529–13537.

30.

https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud]