Abstract
Handwriting is an individual trait that serves as evidence to authenticate a particular writer. Identifying the writer of a handwritten text has shown encouraging results in examining historical and forensic documents. In this paper, we propose a novel offline writer identification system based on the challenging analysis of small amount of data to extract distinct patterns. In our deep network, the feature extraction process relies on a specially designed dual-path architecture, and the resulting embeddings are concatenated to produce the final learned features. To deal with a variety of uncertainties such as high intra-class variations and noises, we leverage the fuzzy logic in the design of a custom Convolutional Neural Network (CNN) with a type-2 fuzzy activation function for the first path. Additionally, the second path utilizes the transfer learning-based CNN to enhance the discriminability of the learned features. Our method allows for text-independent writer identification, eliminating the need for identical handwriting samples to train and test the model. Considering that various factors can influence the handwriting style, a dataset containing right-to-left handwriting samples is assembled. The proposed method is evaluated on our developed dataset and four widely-known public datasets, namely KHATT, CVL, Firemaker, and IAM. High accuracy values are achieved, with results of 99.85%, 99.83%, 99.79%, 99.64%, and 98.17% for each dataset, respectively. One noteworthy aspect of this study is that the evaluation results on diverse datasets demonstrate the applicability of the proposed model to various languages. Moreover, the model performs effectively in real-world scenarios with limited handwritten data.
Keywords
Introduction
Handwriting encompasses particular symbols and allographs that reflect the writer’s characteristics and personality [1, 2]. Writer identification is the process of determining the writer of a handwritten text within a special group of writers, achieved through the analysis of specific features embedded in the handwriting style [3, 4]. The writer identification system can be categorized into two branches of online and offline, determined by data collection methods. In the online approach, data is gathered while writing with a digital pen, offering additional information like writing speed. Conversely, the offline method relies solely on handwriting images [5, 6]. This offline method remains significant due to the unavailability of digital tools everywhere [7, 8]. Furthermore, identification can be text-dependent with consistent content for all authors or text-independent with no content restrictions [9].
Advancements in image processing and machine learning techniques have driven remarkable progress in the methods proposed for writer identification, treating it as a pattern recognition challenge, over the years [4]. Traditional methods employ handcrafted features to ascertain the writer’s identity. Ahmed et al. [10] introduced an offline text-independent writer identification system based on the Ending Codebook approach and optimum features, achieving a 95.59% accuracy rate on the IAM dataset. Their methodology incorporated the Moor’s algorithm for writer differentiation. Khan et al. [3] proposed a model that computes writing style similarity between distinct handwriting images. This method relies on the similarity and dissimilarity Gaussian mixture model, alongside a weighted histogram of scores for each writer. The authors achieved the accuracies of 99.03% on CVL, 97.98% on Firemaker, and 97.85% on IAM datasets. In [5], researchers introduced a Block Wise Local Binary Count operator for distinguishing writing styles using a collection of histograms. They applied the Nearest Neighbor rule for sample classification, resulting in accuracies of 98.38% on the CVL dataset, 96.47% on Firemaker dataset, and 88.99% on IAM dataset. In another study, Chahi et al. [11] provided a new feature extraction technique for handwriting samples. This descriptor was applied to small connected components within the samples, and a Nearest Neighbor classifier was employed to identify the authors of the handwritten content. The evaluation was performed on various standard databases across different languages, yielding the accuracies of 99.67% for CVL, 98.54% for IFN/ENIT, 97.60% for Firemaker, and 94.06% for IAM datasets. These Traditional approaches demand substantial handwriting style information spanning pages or multiple paragraphs. However, in historical document analysis and forensic applications, the task that emerges as both crucial and challenging involves identifying the writer based on scant text, potentially even a single word including limited information.
The latest research indicates that the utilization of deep learning techniques has led to the development of effective approaches for the automatic learning and extraction of intricate features from input images. Xing et al. [12] presented a novel multi-stream CNN designed to leverage spatial relationships among handwriting patches, thereby enhancing identification accuracy. The authors also devised an effective patch scanning approach to handle varying sizes of handwriting images, achieving an accuracy of 97.03% on the IAM dataset. Ni et al. [13] proposed a hybrid methodology, combining deep learning and traditional computer vision techniques, to address noisy images in the context of writer identification. In order to identify the writer based on single-word images, He and Schomaker [14] presented a convolutional neural network architecture comprising two parallel pathways. This architecture employed deep learned features from explicit content recognition as an auxiliary task and transferred them to writer identification as the main task. They reported an accuracy of 93.7% on the CVL dataset and 86.1% on the IAM dataset. Pretrained CNN models, which offer the advantages of leveraging existing learned features, have been utilized in several research endeavors to achieve improved performance. Cilia et al. [4] introduced a three-stage model for writer identification in medieval manuscripts with the aim of applying deep learning to small datasets. They leveraged transfer learning techniques by examining several pre-trained deep neural networks in the initial two stages, and adopted weighted majority row decision-making for the third stage. FragNet [15] is another proposed model designed for extracting efficient features from word images, employing a deep architecture with two pathways. The training of this deep model involved using segmented fragments of the input image and feature pyramid maps. FragNet obtained an accuracy of 99.1% on CVL, 97.6% on Firemaker, and 96.3% on IAM datasets. One limitation of FragNet lies in its requirement for region segmentation, which proves challenging for cursive writing documents. Employing limited handwritten data, Javidi et al. [16] introduced a writer identification system that combines deep learned features extracted from a customized version of ResNet with supplementary information derived from a traditional handwriting thickness descriptor. In this study, the accuracies of 99.67%, 97.60%, and 94.06% were attained on the CVL, Firemaker, and IAM datasets, respectively. However, real-world scenarios involve writers employing pens with varying tip widths, thereby impacting handwriting thickness. Semma et al. [17] proposed a hybrid framework for writer identification, achieving a 99.5% accuracy on the IAM dataset. This approach revolves around detecting cores in handwriting images through the use of FAST and Harris corner detectors. To train the deep CNN, the authors utilized small patch segments situated near these cores.
In this paper, we develop a novel method with a dual-path architecture for offline writer identification based on handwritten word images. Our method is text-independent and applicable across multiple languages, irrespective of the textual content within the handwriting. Drawing inspiration from [12] and aiming to extract discriminative features from limited data, such as images containing a few letters or a single word, we employ a cropping method to extract fixed-size patches. Our contributions to this work can be outlined as follows: We employ a concatenation of an interval type-2 fuzzy-based CNN model and a transfer learning-based CNN model within our dual-path architecture to enhance the efficiency of the writer identification system for real-world applications. Considering the potential impact of image noise and the variations in individual handwriting style as challenging uncertainties affecting identification performance, we leverage the advantage of the type-2 fuzzy activation layer to elevate the learning performance of the proposed deep model. Recognizing that addressing challenges such as focusing on right-to-left languages like Arabic, Persian, and Urdu, handling various pens and colors, and accounting for intra-class variations remains ongoing, we assembled a dataset specific to right-to-left handwriting.
The remainder of this paper is structured as follows: Section 2 presents the proposed method and its conjugated structure. In Section 3, we introduce our collected right-to-left dataset along with four well-known handwriting datasets. Experimental results are presented in Section 4, followed by a discussion in Section 5. Finally, Section 6 provides the conclusion.
Proposed writer identification method
In this section, we present the structure of the proposed framework. A key feature of our architecture is the dual-path design, which combines an interval type-2 fuzzy-based CNN model and a transfer learning-based CNN model. This design aims to capture robust features from the handwriting images, leverage knowledge from large-scale datasets, and enhance the learning performance of discriminative features from handwriting images characterized by high intra-class variation and limited data, making it highly effective for writer identification tasks.
As shown in Fig. 1, in this study, we collected various handwriting samples from different individuals in the Persian language. After scanning, the collected samples are transformed into grayscale images and pre-processed in the subsequent steps. Additionally, to enhance model generalization and prevent overfitting, several data augmentation techniques are applied to the training data. Next, to train the model, pairs of identical handwriting images are separately fed into two paths. The deep features extracted from each path are then concatenated. Afterward, two fully connected layers are employed, followed by a dense layer with dimensions equal to the number of classes, which is used to classify the writer class of the input handwriting image. The model’s predictions are compared against the actual labels, enabling the calculation of the loss. Through backpropagation, gradients are computed and used to adjust the model parameters using the designated learning rate.

The flow diagram of the proposed dual-path method for writer identification.
The iterative process ensures that the model refines its internal representations to better capture distinctive features from the handwriting images, thereby enhancing its ability to accurately classify and identify the writers. Additionally, continuous monitoring of the model’s performance on the validation set is conducted during training to mitigate overfitting and ensure optimal generalization. In the final stage, the system’s performance in identifying the writer is evaluated based on the test data. Figure 2 illustrates the pseudocode of the proposed writer identification method. In addition, Table 1 presents the architecture configuration details of the proposed method, while further explanations about each of the two models used in the two paths are provided in the subsequent subsections.

The pseudocode of the proposed method.
The architecture configuration details of the proposed method
CNNs have caused dominant progress in classification problems such as writer identification. In order to achieve a high level of accuracy, fuzzy logic can be exploited variously for instance in deep neural network structures, either in the feature extraction process or the classification phase [18]. One of the most significant components in deep neural networks is the activation function that plays an influential role in the learning process [19]. To improve the learning performance, inspired by [19], we integrated the interval type-2 fuzzy activation layer in the proposed CNN model. The input-output mapping of the activation unit is defined with three hyperparameters γ = [α, P, N] as follows [19]:
In positive quadrant, the parameter P is the controller for the slope of the function and in negative quadrant, the parameter N adjusts the slope of the function. The parameter k can be formulated as follows [19]:
For more details of the type-2 fuzzy activation unit, please refer to [19].
As illustrated in Fig. 3, the interval type-2 fuzzy-based convolutional neural network is structured on five blocks. Every block includes two convolutional layers and a max pooling layer with the size of 2×2 and stride 2. After each convolutional layer, a batch normalization (BN) layer is utilized to speed up network training and the custom-activation layer with an interval type-2 fuzzy (IT2F) activation unit is applied in order to boost the learning performance. In all convolutional layers, the kernel size is 3×3 and stride and padding are fixed to 1. Fewer filters are used in first layers and the number of filters increases in deeper layers. Therefore, the number of filters is 32, 64, 128, 256 and 512, respectively. Finally, the global average pooling (GAP) and flatten layer are utilized before using the extracted features forconcatenation.

The proposed architecture of the IT2F-based CNN model (Net1).
The reuse of a pre-trained model as a starting point for a new problem is known as transfer learning. This technique has a great contribution to time saving and better performance in real-world applications. As shown in Fig. 4, Inception V3 [20] as a deep feature extractor is employed in the proposed model. As a pre-trained CNN model, Inception V3 includes 48 layers. It is a version of GoogLeNet [21] already trained on the ImageNet dataset with more than one million images. In our modified Inception V3, a global average pooling layer and a flatten layer are added in place of the last classifier layer.

The architecture of the transfer learning-based CNN model (Net2).
This section explains our designed right-to-left dataset and introduces four public datasets used for evaluation.
Data collection
Persian, Arabic, Hebrew, and Urdu are right-to-left languages with different writing styles based on the way letters are connected to one another [22]. Despite this, there are few publicly available right-to-left datasets. Therefore, we designed a sample right-to-left dataset that is made up of Persian words and numbers taking into account the variability of individual handwriting samples. A total of 50 writers (29 males and 21 females) of various ages and occupations participated in the data collection process. Most of the participants were between the ages of 16 and 24 while 20% were between of 33 and 42 years old, and 10% were over 53 years old. There are a majority of right-handed contributors. The writers used various types of pens in a variety of colors. In order to observe the variations between handwriting samples of any particular writer, each participant was provided with seven sheets of paper and asked to write the textual content of each sheet ten times. Sample handwriting images of our designed dataset which is called “FARS” and variations of a specific writer are illustrated in Figs. 5 6, respectively.

Sample handwriting images of the collected dataset.

Samples of intra-class variations among handwritten texts of a specific writer.
A number of well-known datasets are used in order to evaluate the proposed method, including IAM, KHATT, CVL, and Firemaker. IAM [23] is a widely used English dataset including 1539 handwritten pages from 657 writers. The documents are available in different levels of sentence, line and word. KHATT [24] contains the handwriting of 1000 Arabic writers in two levels of paragraph and line. CVL [25] handwriting dataset has word images of 310 writers in English and German. Firemaker [26] contains 1000 Dutch handwritten pages from 250 writers.
Experiments and analysis
This section focuses on describing the used preprocessing techniques and discussing the experimental results and comparisons.
Preprocessing
In order to extract meaningful features and facilitate precise analysis, the preprocessing stands as a critical factor in the context of writer identification using handwriting images. A public dataset of handwritten text may contain handwriting pages, lines, or words of varying sizes that may affect CNN-based models. However, resizing a handwriting image to a fixed size can result in handwriting-style information being lost. For real-world applications, it is also very important to identify the writer using limited handwritten text. Therefore, after transforming the collected samples into grayscale images, we cropped 80×180 patches of handwriting images that almost included one word. By applying this patch cropping strategy to 3500 sentences from the collected dataset, 52507 patches of 80×180 size can be extracted. Additionally, Normalization of pixel values between 0 and 1 was applied during the preprocessing stage to ensure a consistent data distribution, which in turn facilitates efficient model training. To improve the robustness of the dataset, induce diversity and enhance model generalizability while reducing the risks of overfitting, a set of data augmentation methods were applied to the training data through the Keras ImageDataGenerator. These techniques include a range of transformations, including brightness adjustments, controlled zoom, random rotations and random noise. Figure 7 illustrates a few cropped patches from the datasets used in this study.

Samples of cropped patches from used datasets.
In order to design the Net1 of the proposed model, as shown in Fig. 8, we varied and examined the number of convolutional blocks (including two convolutional layers and one max pooling layer) from 3 to 9 as a model hyperparameter. It is observed that 5 can be considered as the optimal number of blocks and for more than 5, the accuracy was almost constant, while, due to the rise in the number of parameters, the execution time increased.

Change in accuracy and training time based on different number of blocks in Net1.
In our study, the datasets were divided into three subsets: a training set (70%), a validation set (10%), and a test set (20%). Refer to Table 2 for a comprehensive overview of the data distribution used for training, validation and testing in five datasets.
The data distribution on five datasets
As outlined in Table 3, to enhance accuracy, we fine-tuned the hyperparameters. The learning rate was set at 1e-04, while the batch size was fixed at 64. The hyperparameters α, P, and N in Equations 1 and 2 were assigned values of 0.2, 0.5, and 0.3, respectively. The Training process was carried out utilizing the Adam optimizer and the cross-entropy loss function. Across all experiments, even when extending the maximum training epochs to 100, the proposed method consistently demonstrated acceptable outcomes. The execution took place on an NVIDIA GeForce GTX 1070 8GB, and each epoch consumed approximately 8.9 minutes, resulting in a cumulative training time of 890 minutes.
List of training parameters
We conducted a series of experiments aimed at assessing the efficacy of the proposed method and analyzing the influence of its structural components. Accuracy serves as the benchmark for evaluating the performance of the proposed model in identifying the writer of a handwritten word. This evaluation criterion is defined as follows:
In the first analysis, we compared the performance of our model through employing different activation functions. This was achieved by using the activation functions of ReLU and Leaky ReLU instead of an interval type-2 fuzzy (IT2F) activation function in Net1 on the collected dataset and the IAM dataset. Figure 9 illustrates a comparison of these results. The experimental results indicated that the proposed method with an interval type-2 fuzzy activation function was more stable and obtained the desired values in fewer epochs on both datasets, evidencing its usefulness.

The training accuracy comparison of the proposed model with different activation function layers in Net1 on a) our collected b) IAM dataset.
In the second experiment, the writer identification performance of the proposed method is compared using four well-known pre-trained models including VGG19 [27], ResNet50 [28], InceptionResnetV2 [29] and Inception V3 in the backbone of Net2 as a feature extractor. Figure 10 illustrates the results of this comparison on the collected dataset. It can be seen that higher accuracy was obtained using modified Inception V3.

The identification performance of the proposed method with different pre-trained models in Net2.
Besides, the comparison of the test accuracy of our proposed method using different transfer learning-based networks in Net2 on IAM, KHATT, Firemaker, and CVL datasets, as illustrated in Fig. 11, evidenced the efficiency of InceptionV3.

The identification performance of the proposed method with different pre-trained models in Net2.
The progress of our method is shown in Fig. 12 during 100 epochs of running on all five datasets, which demonstrate that the accuracy curves of the proposed network increased rapidly in fewer epochs and also remained relatively constant over the last 70 epochs. Using the collected dataset, it was found that the accuracy rate of the proposed method was 99.85%.

The progress of our proposed method on five datasets.
For better visualization, the performance of our proposed model is shown in Fig. 13 using t-SNE embedding [30]. We randomly selected 30 samples of 20 classes of the validation set from the collected dataset to demonstrate the features extracted from the last FC layer. As can be observed, the intra-class validations have been decreased indicating that the deep concatenated feature are discriminative.

T-SNE embedding of raw data (a) and last FC layer (b) for 20 classes of the collected dataset.
For the purpose of conducting another reliable performance evaluation of the proposed method, the testing data of the collected dataset were used to plot ROC curves for the first 10 classes. In a ROC curve, the True Positive Rate versus the False Positive Rate can be plotted. It is illustrated in Fig. 14 that the True Positive Rate is acceptable for all classes.

ROC curves for the first 10 classes of the collected dataset.
Finally, we evaluated and compared the performance of our approach with state-of-the-art research on KHATT, CVL, Firemaker and IAM datasets. In the following, the performance comparisons on four tested datasets are provided and results are summarized in Tables 4–7. As it is evident in Table 4, the highest competitive performance with an increase of 0.23% is achieved on the KHATT dataset by the proposed method. Conducting the accuracy rate comparison on the CVL dataset, the proposed method reached 99.79% accuracy and it is observed in Table 5 that it outperformed the other methods. With an accuracy of 99.64% on the Firemaker dataset, it is demonstrated that the performance of the proposed method obtained an advantage over the notable results of the recent studies. More details are presented in Table 6. On the IAM dataset, the achieved accuracy rate of our proposed method is 98.17%. According to Table 7 that provides the comparison results on this dataset, the proposed method of Lai et al. [31] ranked first with 98.50% accuracy. However, it should be noted that this result was not obtained employing the handwriting samples of all writers of the IAM dataset, therefore the identification accuracy of the proposed method is considerable at 98.17% for all 657 writers. With regard to these comparisons, it can be concluded that utilizing the proposed structure in our method can cause noticeable results.
The performance in comparison with the state-of-the-art methods on the KHATT dataset
The performance in comparison with the state-of-the-art methods on the CVL dataset
The performance in comparison with the state-of-the-art methods on the Firemaker dataset
The performance in comparison with the state-of-the-art methods on the IAM dataset
The examinations and results presented in the previous section underscore the efficiency of our proposed approach, which adeptly exploits concatenated deep features achieved via a two-path framework. The accuracies obtained on several datasets substantiates the competitive performance of our model compared to other hybrid methods described in previous researches [14–16]. While deep convolutional neural networks wield remarkable prowess in image analysis and feature extraction, our standard experiments demonstrate that the fusion of a neural network enriched with a type-2 fuzzy activation function and a pre-trained neural network yields an augmentation in identification performance, particularly in scenarios involving limited data and high intra-class variations. Since the managerial implications of author identification based on handwritten words resonate broadly across diverse industries, encompassing domains like forensic medicine and historical document analysis, the findings of this research hold the potential to be beneficial for experts in document review.
Conclusion
This paper presented an effective method for identifying writers based on handwriting word samples. The proposed approach employs a dual-path architecture that leverages the synergy of two deep models. This allows for the extraction of deep features from small patches of handwriting texts, which are subsequently concatenated. In the first model, a deep neural network is well-designed using interval type-2 fuzzy logic to customize activation functions within convolutional layers. Meanwhile, the second model involves a modified pre-trained convolutional neural network using transfer learning techniques. A dataset comprising words, numbers, and sentences was meticulously collected, taking into consideration the intra-class variations. Thorough evaluations were conducted on both the gathered dataset and four benchmark datasets, namely IAM, KHATT, Firemaker, and CVL. The outcomes of these evaluations exhibit promising results, further underscoring the efficacy of the proposed method. While our cropping patch size is well-suited for languages written in a horizontal direction, it poses a limitation in a few languages with vertical handwriting; therefore, it would require dimensions to be reversed to effectively capture the predominant black pixels of handwritten text. In future endeavors, our proposed method could be extended to encompass handwriting style information from handwritten documents, potentially by employing a sliding window strategy. Moreover, our method could be explored for its applicability in writer verification tasks.
