Handwritten digit and Roman string recognition using gated CNN with federated learning

Abstract

Handwritten number recognition has been extensively studied in the fields of machine literacy and computer vision, with datasets like MNIST serving as benchmarks. However, handwritten Roman numeral recognition presents unique challenges due to the diverse forms and structures of Roman numerals. In this paper, we propose a novel approach that combines Federated Learning with advanced neural network architectures to tackle this challenge effectively. Our methodology involves data acquisition and preprocessing, including the normalization of handwritten number and Roman numeral images. We design a hybrid neural network architecture that integrates Gated Convolutional Neural Networks (CNNs) for pixel-level feature extraction and Bidirectional Gated Recurrent Units (BGRUs) for sequence modeling. This architecture is essential for handling the complexity of recognizing both image and sequence data. Federated Learning is incorporated into our approach to train the model across multiple decentralized devices or servers while preserving data privacy. This ensures that sensitive handwritten data remains secure throughout the training process. By allowing model updates to be computed locally and aggregated without sharing raw data, Federated Learning maintains privacy and security in distributed learning environments. During training, each device computes gradients based on its local data and shares only the model updates with the central server. The central server aggregates these updates to update the global model, which is then sent back to the participating devices for further refinement. This iterative process continues until the model converges, while metrics such as accuracy, precision, recall, and F1-score are used to evaluate the model’s performance on a separate test dataset. Our approach demonstrates promising results in accurately recognizing both handwritten integers and Roman numerals, even in the presence of noise and variability in writing styles. By combining Federated Learning with advanced neural network architectures, our approach not only achieves state-of-the-art performance but also ensures data privacy and security in distributed learning environments.

Keywords

Gated convolutional neural network bidirectional gated recurrent unit feature extraction federated learning

1. Introduction

The goal of deep learning is to develop deep learning principles using artificial neural networks that are able to read and analyze opinions. Deep learning uses neural networks with multiple layers of nodes between the input and output layers, so the term “deep” refers to the number of layers in the network.

Including more layers means a deeper network, which is necessary for more complex operation algorithms, reuse huge amounts of data, achieve high performance and facilitate efficient feature extraction.

The demand for deep learning has increased due to the need to process huge amounts of data, especially in the field of numerical images and handwritten Roman numerals. Number recognition plays a key role in several sectors such as banking and financial institutions, where it is used for things like processing checks, reconciling accounts and extracting numerical data from documents.

Similarly, number recognition is in industries such as retail and treatment automate data entry tasks by extracting numeric data from forms, checks and invoices. In addition, automated risk assessments, parking monitoring and control systems rely on number recognition for tasks such as license plate reading, which improves efficiency and accuracy in various fields.

On the other hand, recognition of Roman numerals is important in written work. documents, especially ancient texts, where an understanding of date, numerical references and running numbers is crucial. Integrating ramp numbering into software can help researchers learn and practice conversions and calculations using Roman numerals, facilitating educational efforts. In addition, it finds application in many different fields, emphasizing its versatility and importance in different fields.

Incorporating federated learning into the development of deep learning systems improves privacy and security when exploiting distributed datasets. By enabling model training on distributed devices or servers without sharing raw data, Federated Learning ensures data confidentiality, which is critical when handling sensitive handwritten data.

For example, number pictures and Roman numerals. This approach not only improves the performance and scalability of deep learning models, but also solves privacy issues, making it suitable for use in real-world scenarios in various fields.

2. Related works

Automatic handwriting recognition systems have been a focal point of interest in both academic exploration and marketable operations. Recent advancements in Deep Learning ways have led to significant progress in addressing classical computer vision challenges, with a notable impact on Handwritten Text Recognition (HTR) [1] still, a compelling challenge persists in the realm of Handwritten number String Recognition (HDSR). In this environment, it becomes necessary to manage the constraints of limited training data and the intricate task of segregating integers from the girding noise [5].

A new optic model armature known as Gated CNN-BGRU has been introduced, inspired by the workflow of HTR, and applied to the sphere of HDSR. This approach has garnered attention for its implicit to attack the unique challenges posed by HDSR [3]. These marks give a standardized frame, including criteria, datasets, and recognition styles, against which different models can be strictly estimated [10]. This ensures a fair and transparent comparison of the proposed Gated CNN-BGRU model with other being styles. This expands the compass of the evaluation beyond the competition marks, demonstrating the model’s rigidity to different data sources [3]. Additionally, the evaluation includes a comparison with 11 distinct cutting-edge HDSR techniques as well as perceptivity from two optical models that have excelled in HTR The Gated CNN-BGRU model exhibits robustness indeed when handed with a limited volume of training data, occasionally as many as 126 samples. It is significant that it performs better than being styles, attaining an average precision of 96.50 [12]. This is an impressive average improvement of 3.74 chance points over the HDSR state-of-the-art.

[13] One particularly noteworthy achievement is the model’s performance on the CVL HDS set, where it achieved a precision rate of 93.54 [15]. There is still a sizable research gap when it comes to automatically detecting handwritten Urdu characters, despite the fact that the field of automatic detection of handwritten characters in major languages like English has experienced substantial research [2]. The intricacies of Urdu script, including variations in writing style, character shapes, sizes.

[7] An essential piece of technology for automating the classification of character patterns and object photos is deep neural networks [2]. Deep networks are known to produce excellent results when trained on massive datasets with millions of photos. Deep network application to small image datasets, however, has proven to be challenging. Researchers have taken on the goal of creating a classification framework for the automatic recognition of handwritten Urdu letters and numerals in response to this issue.

Their aim is to achieve advanced recognition accuracy by leveraging the concept of transfer learning and utilizing pre-trained Convolutional Neural Networks (CNNs) [2]. Using a pre-trained Alex Net CNN model and Support Vector Machine (SVM) approaches, the effectiveness of transfer learning is evaluated using a variety of ways [10]. The Alex Net model for feature extraction and classification was also improved by the researchers. The outcomes of their tests and quantitative comparisons show how well this research, which makes use of an improved Alex Net [5], performs when it comes to reading handwritten characters and integers. Notably, their suggested strategy performs better than current state-of-the-art approaches, obtaining a remarkable classification accuracy of 94.92% for Urdu characters, integers, and cold-blooded datasets separately [6]. These results show promise for numerous applications, including the recognition of Urdu characters as well as a variety of other areas like handwritten text image restoration, automated postal address reading, processing bank checks, and the preservation and digitization of historical calligraphic works.

Across various languages, the realm of Handwritten Digit Recognition (HDR) presents intrinsic challenges rooted in the diversity of writing styles among individuals, differences in writing media, variations in terrain, and the inherent difficulty of consistently reproducing the same strokes when writing numbers [3]. Additionally, the structural intricacy of a language’s number characters can produce elaborate handwriting that makes HDR tasks more difficult [11]. Through the years, researchers have developed a variety of offline and online HDR strategies, fusing several image processing methods with both conventional and deep learning architectures (ML and DL) [3]. Such investigations should encompass an in-depth analysis of the unique challenges posed by BHDR, the fundamental mechanisms of identification as well as prospective future study areas [15]. The writers of this work examine the peculiar qualities and underlying complexity of Bengali handwritten numerals [4]. This comprehensive paper serves as both a reference guide and a compendium of knowledge for researchers interested in gaining a deeper understanding of the intricacies of BHDR [8].

It aims to inspire further exploration of novel avenues in applicable research, ultimately contributing to advancements in the recognition of Bengali handwritten numerals across various application. Domains. Traditionally, agriculture has relied on traditional technology to detect diseases and pests, often using cloud-based deep learning solutions. However, these methods have several obstacles to overcome, including expensive data storage and transmission, inconsistent and sparse fruit data, various pests and diseases, and difficult detection conditions. This study proposes a unique multipest detection method that solves these problems by combining advanced faster regional convolutional neural network (R-CNN) and federated learning (FL). Enables the management of a common model that integrates the data benefits of all participating parties. without the need to load local data, Federated Learning (FL) is a promising distributed computing model that solves the above-mentioned challenges and reduces communication costs.

To ensure the convergence of the model and improve the training speed, a constraint M is added to the FL algorithm. In addition, instead of the main convolutional layer of the original Faster R-CNN network, ResNet-101, which was usually VGG-16, is used to preserve the original structure of small objects and increase the detection speed. To further increase the accuracy of multidimensional multi-disease and pest diagnosis, multidimensional fusion of feature maps from multiple convolutional layers is performed. In addition, a region proposal network (RPN)-based NMS method is proposed as a solution to the problem of stuck apples during detection. Experimental results show the effectiveness of the updated Faster R-CNN with an average accuracy of 90.27% in detecting multiple pests with a detection time of only 0.05 seconds per image. After integrating federated learning (FL), the average accuracy (mAP) of the model increased to 89.34% and the training rate of the model increased to 59%.

Addressing the limitations of traditional disease and pest detection techniques, a proposed approach. represents a significant advance in agricultural pest control and lays the foundation for more efficient and accurate detection systems in the field. Artificial intelligence (AI) technology has advanced rapidly and opened the door to many applications, including breast cancer diagnosis. However, central learning (CL) systems have been used in most current research in this area, which raises serious privacy concerns. Furthermore, precise lesion localization and identification as well as AI-powered tumour prediction have the potential to significantly increase patients’ odds of survival. In order to mitigate the privacy problems associated with centralized data storage, this study provides a novel solution that uses Federated Learning (FL) in place of CL to overcome these challenges.

The following are the study’s main contributions: In order to extract data features precisely from the region of interest (ROI) within mammography pictures, transfer learning is employed. This methodology facilitates rigorous pre-processing and data enrichment, augmenting the calibre of training data [15]. Utilizing the Synthetic Minority, Oversampling Technique (SMOTE), data is processed with the goal of achieving a more uniform and balanced classification of data, which enhances the performance of disease diagnostic prediction.

The FedAvg-CNN $+$ Mobile Net architecture is used in the study within a FL framework to protect the privacy and security of its participants. For FL, this approach preserves data privacy by pooling reviews of participant settings without exposing the raw data [16]. The performance of several deep learning, transfer learning and FL models using balanced and unbalanced mammography data is demonstrated by experimental observations. The results show that the proposed approach outperforms existing methods, providing a significantly better classification and proving its viability for integrting AI into healthcare applications. This study advances AI-based breast cancer by emphasizing privacy-preserving techniques and using FL for collaborative learning.

Addressing Privacy and Security Issues in Diagnosis and Health Care The increased mortality of women from breast cancer in recent decades underscores the urgent need for effective early detection mechanisms. Current research has attempted to address this challenge through the development of detection systems. However, they often fall short due to limitations, especially in the secure sharing of sensitive medical images, which remains a major challenge in the medical field. In response, this paper proposes an automated disease diagnosis system leveraging Federated Learning (FL) and Deep Learning to enhance efficiency and accuracy in breast cancer detection. The proposed study comprises five key steps: Image Acquisition: Medical images are gathered as input samples for the diagnosis system. Encryption: To ensure confidentiality, gathered medical samples undergo encryption using an Extended ElGamal Image Encryption (E-EIE) method. Optimal Key Generation [17]. The Improved Sand Cat Swarm Optimization (I-SCSO) technique generates appropriate keys in an efficient manner, hence increasing the encryption process’s efficiency.

Secured Data storing: The Federated Learning Flower (FLF) framework for storing significantly improves the security of encrypted photos, guaranteeing improved security throughout transmission. Disease Classification: The Convolutional Capsule Twin Attention Tuna Optimal Network (C2T2Net) model is used to process decrypted photos in order to classify diseases.

By optimizing parameters with the application of the Chaotic Tuna Swarm Optimization (CTSO) algorithm, the suggested classifier lowers available loss. The suggested study uses Python software for simulation analysis and assesses its effectiveness using the Break His Database. The suggested method produces high performance measures, such as accuracy (95.68%), recall (95.6%), precision (95.66%), F-measure (95.63%), specificity (95.6%), and kappa coefficient (95.26%), according to experimental data.

The proposed project will advance automated disease diagnosis systems for breast cancer detection, addressing the challenges of secure information sharing and increasing the efficiency of detection, ultimately leading to better patient outcomes and lower mortality. Today’s deep learning computerized diagnostic systems have become invaluable tools in the field of medical imaging. However, there were challenges with the difficult task of creating comprehensive annotations placed on doctors when they diagnose diseases with multiple medical institutions.

In addition, there are many problems in generalizing models and protecting privacy in centralized learning systems. To address these challenges, this study proposes two new federated active learning techniques: Efficient Federated Active Learning (TEFAL) and Efficient Federated Active Learning (LEFAL) for multicentre disease diagnosis. The proposed LEFAL technique uses a task-agnostic hybrid sampling strategy that simultaneously considers data uncertainty and variability to increase data efficiency [18]. This approach aims to reduce the effort associated with long notes and im-prove the accuracy of disease diagnosis.

On the other hand, the proposed TEFAL approach increases the efficiency of customers in a blended learning context by analyzing the informativeness of customer data using a discriminator. To confirm the effectiveness of these methods, experiments are performed on two datasets: the Hyper-Kvasir dataset for the diagnosis of gastrointestinal diseases and the CC-CCII dataset for the diagnosis of COVID-19. The results show that with only 65% labelled data, the LEFAL approach achieves 95% success in the gastrointestinal disease segmentation task. In addition, TEFAL achieves an accuracy of 0.90 and an F1 score of 0.95 with only 50 iterations for the classification task on the CC-CCII dataset. Extensive experimental evaluations show that the proposed combined active learning methods outperform state-of-the-art approaches in segmentation and classification tasks in multicentre disease diagnosis [19]. These methods not only pro-vide interesting future directions for the improvement and accuracy of comorbidity diagnosis, but also address the problems of comprehensive annotation and privacy protection in medical imaging.

To improve prediction, multipoint technology has gained popularity in the diagnosis of brain diseases, using sample data from several medical institutions. However, there is a risk of privacy violations because the process of training multi-site models usually involves transferring original photos or features of objects between sites. This research addresses these issues by introducing a unique framework for robust multi-site prediction, called Self-Supervised Federated Adaptation (S2FA), which aims to reduce the risk of privacy leakage. To our knowledge, this is the first study addressing the diagnosis of brain disease across sites. This scenario is often seen in clinical practice when a model is trained at source locations and tested at destination locations.

The proposed S2FA framework consists of several important parts: Decentralized federated optimization: each site uses a decentralized distributed optimization technique to regularly change model parameters while protecting data privacy. Building an auxiliary self-directed model: To build a self-directed auxiliary model, information from the sources is transferred to the target site using self-directed learning methods. Hash Mapping Encoding Feature: Describes a hash mapping technique for encoding target attributes to minimize the risk of privacy data leakage and heterogeneity of address information between sites.

Cross-site prediction: Cross-site prediction is realized by combining an auxiliary object model and a weighted combined source model to achieve reliable prediction performance at multiple sites. By combining these elements, the S2FA framework provides a comprehensive solution for multisite brain disorder diagnosis that effectively resolves privacy concerns and improves predictive accuracy. This method offers a viable and privately maintained alternative to inter-institutional collaborative diagnosis, an important achievement in the field [20].

3. Problem statement

The recognition of handwritten integers and Roman numbers poses a challenging problem with numerous practical applications, such as digitized document analysis, educational tools for numeric literacy, and optical character recognition (OCR) systems.

Handwritten characters can exhibit a wide range of styles, making it difficult to develop a robust recognition system that can effectively handle this variability. Moreover, acquiring a diverse and expansive dataset of handwritten integers and Roman numbers can be challenging, especially considering real-world characters often contain noise, deformations, and variations.

Designing an effective neural network architecture to recognize both handwritten integers (e.g., 0–9) and Roman numbers (e.g., I, V, X) in a single model presents a complex task. The model must be able to handle sequential data (for Roman numbers) as well as picture data (for integers) in order to complete this assignment. To make sure the model is useful in practice, it is also essential to evaluate its accuracy, precision, recall, and F1-score for both Roman and handwritten integers.

We suggest using Federated Learning (FL) as a solution to deal with these issues. FL protects data privacy by enabling the cooperative training of models across numerous decentralized devices or servers without requiring the release of raw data.

Through the application of FL, we can combine knowledge from a variety of datasets provided by different sources, guaranteeing the recognition system’s robustness and generalizability. Moreover, FL makes it easier to integrate Bidirectional Gated Recurrent Units (BGRUs) for sequence modelling and Convolutional Neural Networks (CNNs) for visual feature extraction into neural network architectures.

This hybrid approach enables the model to effectively handle the complexity of recognizing both image and sequential data, thus improving recognition accuracy and performance.

By adopting FL and integrating advanced neural network architectures, we aim to develop a federated learning-based solution that addresses the challenges associated with recognizing handwritten integers and Roman numbers while ensuring practical usability and maintaining data privacy.

4. Proposed system

4.1 Data preparation and preprocessing

Acquire a comprehensive dataset containing handwritten integers (e.g., 0–9) and Roman numbers (e.g., I, V, X). Standardize the images by resizing them to a harmonious size, such as 28 $\times$ 28 pixels. Further- more, preprocess the Roman numeric images to segment individual characters if they are written together, ensuring accurate representation for training.

Model Architecture: The model architecture is meticulously designed to accommodate both image data (for integers) and sequential data (for Roman numbers) within the framework of Federated Learning (FL). It comprises:

Convolutional Neural Network (CNN): Employing one or more convolutional layers equipped with activation functions (e.g., ReLU) and pooling layers (e.g., Max-Pooling) to efficiently extract salient features from input images.

Bidirectional Gated Recurrent Units (BGRU): Incorporating one or more layers of bidirectional Gated Recurrent Units (BGRUs) to capture intricate sequential pat- terns and dependencies inherent in Roman numbers.

Model Training: Use Federated Learning (FL) with an appropriate optimization algorithm (like Adam) and loss function (like categorical-cross-entropy) to train the hybrid model. During training, the model learns about the unique traits of each category by using both numeric and Roman numeric data.

Evaluation: Using a separate test dataset, assess the model’s performance by carefully calculating common metrics like precision, recall, accuracy, and F1-score for both integers and Roman numbers. Robustness and reliability in practical applications are guaranteed by this thorough study.

Post-processing: Implement post-processing techniques as needed, such as addressing challenges related to multiple Roman numbers written together or minimizing false negatives, to refine the model’s predictions and enhance overall performance.

Hyperparameter tuning: To maximize the model’s effectiveness in handling the intricacies of handwritten integer and Roman number recognition tasks, conduct thorough experimentation with a range of hyperparameters, such as the number of convolutional layers, BGRU units, learning rates, and batch sizes.

Through the integration of CNNs and BGRUs within the Federated Learning framework, the proposed system adeptly addresses the intricate challenges associated with recognizing handwritten integers and Roman numbers, thereby showcasing its capability in real-world scenarios.

Gated CNN with BGRU:

Convolution block:

Federated learning approach:

4.1.1 Dataset details

Figure 1.

Dataset for roman values.

Figure 2.

Dataset for numerical values.

Figures 1, 2 represents numerical dataset is a structured collection of numerical data points that are frequently arranged into rows and columns, along each row denoting each column containing a single observation or sample denoting a particular numerical attribute or variable.

4.2 Classification using federated learning and gated-CNN with BGRU

A neural network design known as a Gated CNN with BGRU smoothly combines the ideas of bidirectional gated recurrent units (BGRUs) and gated convolutional neural networks (CNNs). With the use of deep reinforcement learning (DRL), CNNs, and gated mechanisms – often implemented as Gated Recurrent Units (GRUs) or other structures – this novel method creates a highly complex Deep Learning model.

Here’s how the Gated CNN with BGRU operates, incorporating the principles of Federated Learning (FL):

Spatial Feature Extraction: The input data, typically an image, undergoes processing through convolutional and pooling layers in the initial stages. Convolutional layers, acting as the “brain” of the CNN, analyze spatial properties by applying learnable filters (kernels) to detect patterns and features within the input data. Subsequently, pooling layers reduce spatial dimensions while retaining essential information.

Integration of Gated Recurrent Units (GRUs): A Gated CNN with BGRU utilizes BGRU units in place of conventional RNNs or GRUs. These are made up of two independent GRU layers that process input data in two different ways: forward (from the past to the future) and backward (from the future to the past), thereby capturing bidirectional temporal dependencies.

Gated Mechanisms for Contextual Information: The extracted spatial features are then fed into Gated mechanisms within the BGRU units. These mechanisms, con- trolled by reset and update gates, determine the relevance of previous state information and factor in fresh data to capture spatial dependencies and contextual information effectively.

Final Prediction Layers: Following the Gated mechanisms and incorporating skip connections, the network typically includes fully connected layers and output layers tailored to the specific task. These layers make predictions or categorizations based on the learned features. Activation functions such as SoftMax may be employed for tasks like classification or regression.

4.2.1 Federated learning integration

Incorporating Federated Learning into the Gated CNN with BGRU framework enables collaborative training across distributed devices or servers while preserving data privacy. FL facilitates model training using data from multiple decentralized sources without sharing raw data, ensuring enhanced privacy protection. By leveraging FL, the Gated CNN with BGRU can aggregate insights from diverse datasets contributed by various sources, thereby improving robustness and generalization of the recognition system while maintaining data privacy.

5. Mathematical equation

CNN Layers:

Input Image: (X)

Convolution Operation: (H_c $=$ text{ReLU}left(text{Conv}(X, W_c)right)), where (W_c) represents the convolutional filter weights.

Pooling Operation: (Pooled_H_c $=$ text{MaxPooling}(H_c)) BGRU Layers:

Forward Pass:

Update Gate: (z_t f $=$ text{sigmoid}left(text{FL}(W_z f * [H_{t $-$ 1} f, X_t])right))

Reset Gate: (r_t f $=$ text{sigmoid}left(text{FL}(W_r f * [H_{t $-$ 1} f, X_t])right))

Candidate Activation: (h_t_candidate f $=$ text{tanh}left(text{FL}(W_h f * [r_t f * H_{t $-$ 1} f, X_t])right))

Hidden State: (H_t f $=$ (1 $-$ z_t f) * H_{t $-$ 1} f $+$ z_t f * h_t_candidate f)

Backward Pass:

Update Gate: (z_t b $=$ text{sigmoid}left(text{FL}(W_z b * [H_{t $+$ 1} b, X_t])right))

Reset Gate: (r_t b $=$ text{sigmoid}left(text{FL}(W_r b * [H_{t $+$ 1} b, X_t])right))

Candidate Activation: (h_t_candidate b $=$ text{tanh}left(text{FL}(W_h b * [r_t b * H_{t $+$ 1} b, X_t])right))

Hidden State: (H_t b $=$ (1 $-$ z_t b) * H_{t $+$ 1} b $+$ z_t b * h_t_candidate b $\backslash)$ Output Layer:

Output: (text{Output} $=$ text{softmax}left(W_o * [text{FL}(H_T f),

$\backslash$ text{FL}(H_1 b)]right))

Here, (text{FL}(cdot)) denotes the application of Federated Learning to the corre- sponding operation, ensuring collaborative training while preserving data privacy. The concatenation of previous forward hidden states and current inputs, denoted by ([H_t- 1 f, X_t]), is similarly used in other equations.

The final output layer combines information from both directions by concatenating the final forward hidden state (H_T f) and the first backward hidden state (H_1 b), thereby leveraging bidirectional information for classification.

6. Pseudocode

Load and Preprocess Dataset:

Load the dataset containing handwritten digits and Roman numerals.

Normalize and resize images to a consistent size (e.g., 28 $\times$ 28 pixels).

Segment Roman numeral images if needed.

Initialize Model with CNN Layers:

Initialize the model with Convolutional Neural Network (CNN) layers for feature extraction.

Add Bidirectional Gated Recurrent Unit (BGRU) Layers:

Incorporate Bidirectional Gated Recurrent Unit (BGRU) layers for sequence modeling, capturing temporal dependencies in Roman numerals.

Add Flatten and Dense Layers for Classification:

Add Flatten layers to convert the output of the CNN layers into a one- dimensional array.

Include Dense layers for classification, providing the model with the capability to make predictions.

Compile the Model:

Assemble the model using an optimizer and loss function that are suitable for the particular job of digit and Roman numeral recognition.

Split Dataset and Encode Labels:

Split the dataset into training and testing sets to evaluate model performance.

Encode labels using techniques such as one-hot encoding for classification tasks.

Train the Model:

Implement Federated Learning training loops to enable collaborative learning across decentralized devices.

Train the model on the training data by iterating through epochs.

For each batch of data:

Perform a forward pass through the model.

Compute the loss.

Backpropagate gradients and update model weights accordingly.

Aggregate model updates from participating devices using Federated Learning mechanisms.

Evaluate the Model:

Evaluate the model on the testing data to assess its performance.

For each test sample:

Conduct a forward pass through the model.

Obtain model predictions.

Calculate metrics such as accuracy, precision, recall, and F1-score.

Experiment with Hyperparameters:

Experiment with hyperparameters to optimize model performance.

Adjust parameters such as the number of convolutional layers, BGRU units, learning rates, etc.

Deploy the Trained Model:

Deploy the trained model in a preferred environment, such as a web service or mobile app.

Implement an interface for users to input handwritten digits and Roman numerals.

Utilize the model to make predictions and display the results, providing users with accurate recognition capabilities.

The model may collaborate while learning from data spread across several devices by including Federated Learning into the training process, all while maintaining data confidentiality and privacy. This method addresses privacy issues related to centralized data collecting while enabling scalable and effective model training.

7. Experimental results and display the results

Table 1
Performance metrics

Metrics	Gated CNNwith BGRU	Recurrent neuralnetwork (RNN)	Convolutional neuralnetwork (CNN)	Gated CNN with BGRUusing federated learning
Accuracy	98.2	95.5	96.5	97.3
Precision	98	96	97	96.5
Recall	97	95	96	97
F1-score	97.5	95.5	96.5	96.8

Figure 3.

Trained dataset.

Figure 3 depicts for the development of successful handwritten recognition systems, trained datasets are essential.

Figure 4.

Output for corresponding dataset.

Figure 4 is the specific output you receive depends on the goals and requirements of your handwritten recognition system.

Even with the addition of federated learning, the Gated CNN with BGRU seems to retain the highest overall accuracy when integrating both the digit and Roman numeral recognition tasks, according to the performance metric table that is provided. employed a 64-batch size, 20 epochs of training, and an Adam optimizer with a 0.001 learning rate. For both integers and Roman numbers, metrics akin to accuracy, precision, recall, and F1-score are established.

The accuracy for integers remains high at 97.3, while the accuracy for Roman numbers has improved to 91.5 with the inclusion of federated learning. The model performs exceptionally well for recognizing integers, with minimal confusion between classes. For Roman numbers, it exhibits good accuracy, although some confusion occurs be- tween analogous characters like ‘I’ and ‘V’.

Hyperparameter tuning advanced Roman numeric recognition accuracy by 3 through learning rate adaptation and increased BGRU units, even with the integration of federated learning. The model showed good generalization to unseen data, with minimal overfitting, thanks to data augmentation techniques.

Challenges included handling cases where multiple Roman numbers were written together, and limited data for less common Roman numbers (‘L’, ‘C’, ‘D’, ‘M’) posed challenges. Future work could concentrate on collecting further diverse Roman numeric data and addressing multi-character recognition. Architectural advancements, such as attention mechanisms, could further improve performance in conjunction with federated learning.

8. Conclusion

In this work, we included federated learning into the training process and cre- ated and trained a Gated CNN with Bidirectional Gated Recurrent Units (BGRU) for the recognition of handwritten integers and Roman numbers. The model demonstrated its efficacy in identifying these symbols with an accuracy of 97.3% for integers and 91.5% for Roman numbers. Nevertheless, difficulties surfaced when the model came across situations in which several Roman numerals were written in tandem.

Moreover, the limited availability of data for less common Roman numbers (L, C, D, M) underscored the importance of a more extensive and diverse dataset to enhance recognition accuracy further. By incorporating federated learning, we aimed to enhance the model’s robustness and scalability across distributed data sources while preserving data privacy.

In conclusion, the Gated CNN with BGRU model, augmented with federated learning, demonstrates promising results for the recognition of handwritten integers and Roman numbers. Future research efforts should focus on improving the model’s generalization to new data and advancing methodologies in the federated learning research domain to address challenges in real-world scenarios effectively.

References

De Sousa Neto

Bezerra

B.L.D.

Lima

E.B.

Toselli

A.H.

, HDSR-Flor: A Robust End-to-End System to Solve the Handwritten Digit String Recognition Problem in Real Complex Scenar-ios, in IEEE Access8 (2020), 208543–208553. doi: 10.1109/ACCESS.2020.3039003.

Ashikur Rahman

A.B.M.

et al., Two Decades of Bengali Handwritten Digit Recognition: A Survey, in IEEE Access10 (2022), 92597–92632. doi: 10.1109/ACCESS.2022.3202893.

Rasheed

Ali

Zafar

Shabbir

Sajid

Mahmood

M.T.

, Handwritten Urdu Characters and Digits Recognition Using Transfer Learning and Augmentation With AlexNet, in IEEE Access10 (2022), 102629–102645. doi: 10.1109/ACCESS.2022.3208959.

Sajedi

, Handwriting recognition of digits, signs, and numerical strings in persian, Comput Electr Eng49 (2016 Jan), 5265.

Bezerra

B.L.D.

Zanchettin

Toselli

A.H.

Pirlo

, Handwriting: Recognition, Development and Analysis. Hauppauge, NY, USA: Nova, (2017 Jul).

Diem

Fiel

Kleber

, Handwritten Digit and Digit String Recognition: Benchmarking State-of-the-Art Systems. Singapore: World Scientic, (2018 Apr), 6788.

Gattal

Chibani

, and Hadjadji

, Segmentation and recognition system for unknown-length handwritten digit strings, Pattern Anal Appl20(2) (2017) May, 307323.

Saabni

, Recognizing handwritten single digits and digit strings using deep architecture of neural networksin, Proc 3rd Int Conf Artif Intell Pattern Recognit (AIPR) (2016 Sep), 16.

Zhan

Wang

, Handwritten digit string recognition by combination of residual network and RNN-CTC, in Proc Int Conf Neural Inf Process (ICONIP) Cham, Switzerland: Springer, (2017), 583591.

10.

Lee

K.B.

Cheon

Kim

C.O.

, A convolutional neural network for fault classification and diagnosis in semiconductor manufacturing processes, IEEE Transactions on Semiconductor Manufacturing30(2) (2017), 135–142.

11.

Pasi

K.G.

Naik

S.R.

, Effect of parameter variations on accuracy of Convolutional Neural Network, in 2016 International Conference on Computing, Analytics and Security Trends (CAST) (2016), 398–403. IEEE.

12.

Zhang

, Handwritten digit classification using the mnist data set, Course Project CSE802: Pattern Classification and Analysis (2010).

13.

Yin

Zheng

, Ncfm: Accurate handwritten digits recognition using convolutional neural networks, in 2016 International Joint Conference on Neural Networks (IJCNN) (2016), 525–531. IEEE.

14.

Xie

Wang

Wei

Wang

Tian

, Disturblabel: Regularizing cnn on the loss layer, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), 4753–4762.

15.

Tavanaei

Maida

A.S.

, Multi-layer unsupervised learning in a spiking convolutional neural network, in 2017 International Joint Conference on Neural Networks (IJCNN) (2017), 2023–2030. IEEE.

16.

Jin

Zhang

, Traffic sign recognition with hinge loss trained convolutional neural networks, IEEE Transactions on Intelligent Transportation Systems15(5) (2014), 1991–2000.

17.

Yang

Zhu

Wang

Shao

Zhang

, Self-Supervised Federated Adaptation for Multi-Site Brain Disease Diagnosis, in IEEE Transactions on Big Data9(5) (2023 Oct), 1334–1346. doi: 10.1109/TBDATA.2023.3264109.

18.

et al., Federated Active Learning for Multicenter Collaborative Disease Diagnosis, in IEEE Transactions on Medical Imaging42(7) (2023 July), 2068–2080. doi: 10.1109/TMI.2022.3227563.

19.

Peta

Koppu

, Enhancing Breast Cancer Classification in Histopathological Images through Federated Learning Framework, in IEEE Access11 (2023), 61866–61880. doi: 10.1109/ACCESS.2023.3283930.

20.

Deng

Mao

Zeng

Wei

, Multiple Diseases and Pests Detection Based on Federated Learning and Improved Faster R-CNN, in IEEE Transactions on Instrumentation and Measurement71 (2022), 1–11. Art no. 3523811. doi: 10.1109/TIM.2022.3201937.

Handwritten digit and Roman string recognition using gated CNN with federated learning

Abstract

Keywords

1. Introduction

2. Related works

3. Problem statement

4. Proposed system

4.1 Data preparation and preprocessing

4.1.1 Dataset details

4.2.1 Federated learning integration

5. Mathematical equation

6. Pseudocode

7. Experimental results and display the results

Table 1 Performance metrics

References

Table 1
Performance metrics