Abstract
Recognition of Kannada Characters is a complex task as the number of classes in Kannada language by considering all combinations of vowels and consonants is 623,893. In this paper, the complexity is reduced from 623,893 to just having 313 classes as Main aksharas (Vowel, Consonants,Vowel modifiers and Consonant modifiers) and 30 classes as vattu aksharas(conjuncts) by using two line segmentation. A novel CNN model for recognition of printed and handwritten Kannada characters is proposed. CNN model with two, three and four layers are designed for Main akshara and Vattu aksharas with different filter size. The database consists of total of 31,300 samples and 3000 samples of printed and handwritten characters of Main akshara and Vattu aksharas respectively. Simulation result revealed that CNN model with four layer architecture is the best model for recognition of Kannada characters. This model achieved a recognition accuracy of 98.83% and 99.29% for printed Main akshara and Vattu aksharas and 82.50% and 80.92% for handwritten main and vattu akshara respectively.
Keywords
Introduction
India is known for its multiculturalism through its multilinguals. Kannada, the official language of the south Indian state of Karnataka, is spoken by about 48 million people. The language is also spoken by linguistic minorities in the states of Maharashtra, AndhraPradesh, Tamil Nadu, Telangana, Kerala and Goa and also by Kannadigas abroad [1]. In India where a large number of people reside in rural areas and have a little knowledge of English language it is imperative to facilitate localized channels of communication. In addition to this a very large number of languages face the severity of extinction primarily due to neglect of native textual resources and the vast wisdom thereof hidden. In this era of English dominance native languages including Kannada face the problem of negligence towards its ancient and wisdom oriented textual material, this makes it necessary for the development of digital based support system, which would make the rich indigenous resource of the language universally accessible and hence lead to its higher standing in the global linguistic arena.
The only way of documentation available in government offices and healthcare departments in Karnataka state is through handwritten reports written in Kannada [2]. The contents of these old documents can be reproduced through typewriting, but it is a tedious task as these documents are difficult to read and understand. To store and maintain these documents is highly labor intensive operation. Thus in Government sectors and health departments a Handwritten text recognition system will be a boon to solve this practical issue where most of their documents will be in Kannada. Thus a computer-based system is necessary to overcome the gap between machines and humans. The challenging task in the computer vision community is the extraction of text data in a machine-readable format from real-world images. In most of the practical applications such as updating inventory, analyzing documents, scene understanding, robot navigation and image retrieval, reading of the text in natural images has gained a lot of attention.
Optical Character Recognition (OCR) [2, 3] is the mechanical or electronic conversion of scanned document with printed or handwritten text into machine encoded format has greatly improved the process of data management. This tool enables quick digitization of scanned documents and converts it into recognizable form. OCR eliminates the cost of misplaced or lost documents and offers higher savings in the form of reclaimed office space, which would otherwise be used for storing paper documents.
Traditional character recognition is based on hand crafted features and requires large amount of prior knowledge [4]. This necessitates large amount of pre-processing and to obtain good amount of recognition accuracy is really challenging. In the last few years’ development of deep neural network and their usage in text recognition has produced breakthrough performance. Still the amount of data is massive, variation in the datasets demands further research. Invention of Convolution Neural Network and its strength of perceiving structures of objects and extracting features make it suitable for automatic character recognition. In the present scenario when an enormous number of languages are on the verge of losing its valuable literary resources due to negligence by stake holders, Kannada is also facing a slight heat of the situation. The necessity of a reliable Kannada OCR system is clearly noticeable. With some research [5], it has been found that there are only a few Kannada OCR systems available on the market. These systems, however, aren’t reliable and accurate enough to provide the best results. As per our research, using Convolution Neural Networks, or CNN’s, provides the most accurate results. CNNs consist of convolutional layers or feature maps obtained by convolving images with sets of image filters followed by pooling and fully connected layers. Therefore, in character classification, feature maps are extracted through convolution with filters of size 3 x 3 and 5 x 5. In this paper, a CNN with two, three and four layer network with different filters and different channels is proposed for printed and handwritten Kannada characters.The main contributions in recognition of Kannada Characters using CNN is to visualize the region important for classification using feature visualization. It was found that CNN with four layers having feature maps of 16,32,64 and 128 could extract all details of character. The main contributions of this paper is as follows: The complexity of Kannada characters is reduced from 623,893 classes to 313 classes by grouping the characters as Main akasharas and Vattu akasharas CNN with two, three and four-layer network with different filters and different channels is proposed for printed and handwritten Kannada characters The effect of batch size is studied and found that batch size of 32 gave best result with minimum validation loss The effect of learning curves was analyzed and found the model to be good fit Feature visualization helps to identify the number of feature extraction layers, thus simplifying CNN model with optimum number of parameters
Organization of the paper is as follows: Section 2 deals with related work Section 3 describes the dataset of printed and handwritten Kannada characters used to train and test the CNN model. The architecture of CNN model is presented in Section 4. Section 5 deals with the simulation results. Finally, conclusion and future work is presented in Section 6
Related work
Literature review reveals that many have been working in the development of novel algorithms for classification and recognition of character, word and numerals. Convolution Neural Network brings revolution on the pattern classification. Neural Network approach for identification of Indian script is first proposed by Patil and Subbareddy [6]. They proposed a multilingual, multiscript OCR which is designed for English, Kannada and Hindi script. A modular neural network is used for script identification with two stage feature extraction process. CNN with 13 layers have been applied on a multi dataset experiment [7] on Bangla language with 2-sub layers on four datasets and obtained an accuracy of 98%, 96.81%, 95.71%, and 96.40% respectively.
Pradymna and Niraj S Prasad [8] have discussed on “KanOCR: Conversion of printed Kannada documents to editable form using Convolution Neural Networks”. The characters are first extracted from the images using various segmentation methods. They are then fed as inputs to Convolution Neural Network (CNN) for recognition. The recognized characters are then translated into unicode and then printed. They have used dataset from Kanscan googlepay store link and achieved accuracy of 80%.
Ramesh G, et al. [9] proposed “Offline Kannada Handwritten Character Recognition using Convolution Neural Networks” model and the algorithm has been tested on their own dataset and obtained accuracy of 93.2% and 78.73% on consolidated and raw datasets respectively. The model discussed in all the papers used a single convolution layer with 5 x 5 filter with 64 channels followed by Relu and max-pool layer.In this paper,CNN model with two,three and four layer with different feature maps is discussed.
Dataset of printed and handwritten Kannada characters
Kannada alphabet has 49 or 50 or 51 letters (varnamaale) depending on the rules they follow. There are 13 independent vowels (V) known as swaras and 2 yogavaha, 36 consonants(C), 34 consonants appear in modern kannada text and other two consonants appear in ancient text and are not presently used. The two yogavaha are not used independently and they follow a vowel or a consonant. Consonants are modified by 13 independent vowels and these are called Akashara (syllable) forming aconsonant-vowel (CV) structure creating a total of 468 characters. For each consonant, there are conjuncts known as vattu akashara. The conjuncts appear with an akashara to form complex letter(C(CV)) accounting a total of 16848(36 x 468) characters. Optionally, an akshara can have one or more conjuncts in a C(CV) combination forming a canonical structure of ((C)C) CV accounting a total of 606,528 characters. Thus, the number of theoretically possible combinations of Kannada characters is 623,893 and is listed in Table 1.
Combination of Kannada characters
Combination of Kannada characters
To reduce the complexity of characters, machine learning approach has used three zone segmentation and the characters in the upper zone, middle zone and lower zone are recognized separately and then they are combined to form an akashara(syllable). In deep learning approach all vowels, consonants and consonants with vowel modifiers are considered as separate group known as Main akashara and all consonant conjuncts are taken as another group known as vattu akashara.
Main Akashara consists of 13 vowels and 36 consonants (a total of 49) as shown in Fig. 1(a). Live consonant which retains its inherent vowel or is written with explicit dependent vowel (a total of 34) as in Fig. 1(b) In consonant vowel combination, there are 13 CV combination for each consonant, but 7 combinations for each consonant such as Special characters a total of 6, segmented in CV combination
are considered while the remaining 5 combinations such as (
) is not considered as single syllable but they are considered as two syllableand then appended in the final stage. For example for the letter
which is available in CV combination is appended with
, and hence the total number of CV combination is 224(32 x 7), excluding the four consonants
as in Fig. 1(c)

Kannada Main Akasharas (a) Vowels and Consonants (b) Live Consonants (c) Consonant – Vowel Combination (d) Special Characters.
Therefore, total classes in Main akashara is 313(49 + 34 + 224 + 6). Hence the complexity is reduced from 623,893 to 313 classes for the Main aksharas.
Vattu akashara consists of consonant conjuncts for each consonant and there are 34 conjuncts for each consonants as shown in Fig. 2. In this paper only 30 classes are considered, the conjuncts such as
which do not occur frequently in the text are not considered.

Kannada Vattuakshara (Consonant Conjuncts).
Printed Kannada text is scanned by a flat bed scanner at 300dpi. Lines are segmented using Horizontal Projection Profile(HPP) and words in each line are segmented using Vertical Projection Profile (VPP).The characters in each word are extracted by using two stage segmentation algorithm. In printed Kannada text, for words without bottom extension characters (vattu), the space between two adjacent characters have valleys in the VPP and for words having bottom extension characters (vattu), the space between two adjacent characters do not have valleys in the VPP, which makes it difficult to extract individual characters from the word. In two stage method of segmentation, vattu (which are usually few in number) are segmented from the word using connected component processing in the first stage and the remaining characters from the word(with vattu removed) are easily segmented using traditional VPP method. Two stage segmentation algorithms separate the individual characters as main and vattu aksharas from the words. Figure 3 shows a sample Kannada text and its HPP. Words are segmented from each line using VPP and segmented word from second line is shown in Fig. 4(a). Main akasharas and vattu akasharas from a word are segmented using VPP and connected component analysis. Segmented main akashara and vattu akashara for segmented word is shown in Fig. 4(b) and (c) respectively.

Sample Printed Kannada text and its horizontal projection profile.

Two Stage Segmentation of a word: (a) Sample word (b) Segmented Main Akashara (c)Segmented Vattu akashara.
A total of 50 Kannada text samples were collected from 50 different writers. Writers from age of 25 years to age of 55 years were asked to write only main akasharas and vattu akashara in a line containing at least 10 samples of each. This handwritten document is digitized at 300dpi by a flat bed scanner. Sample of handwritten document with main akashara as in Fig. 5(a)–(c) and vattu akashara is shown in Fig. 5(d).

Sample Handwriiten Document: (a) Vowels (b) Consonants (c) Consonants with Vowel Modifier and (d) Vattu Akashara.
Main akasharas and vattu akasharas are extracted by applying two stage segmentation algorithm. There are 313 classes of main akashara and 30 classes of vattu akashara, 100 samples of each class are extracted from printed text and handwritten text as explained in the above section. Thus the database of printed and handwritten Kannada characters has 31300 main characters corresponding to 313 classes and each category has an average of 100 sample and 3000 vattu characters corresponding to 30 classes and each category has an average of 100 samples. Among the total samples,70% are used for training, 10% for validation and 20% for testing the model.
Proposed model
Convolutional Neural Network (CNN) is a deep learning algorithm which takes an input image assigns learnable weights and biases to various objects in an image and thus differentiates an image from the other. In traditional methods filters are hand crafted but ConvNets have the ability to learn the filter characteristics. CNN model consists of an input layer, convolution layer with k filters, pooling layer, fully connected layer and a classification layer.Three CNN models, Model A, Model B, Model C are designed based on number of convolution layers to recognize printed and handwritten Kannada characters and is shown in Fig. 6.

Architecture of CNN Model: Model A, Model B and Model C.
Model A consists of two convolutional layer, Model B consists of three convolutional layer and Model C consists of four convolutional layer. In these models, Relu activation function is considered after each convolutional layer. In each of the model, convolution is performed by using filters of 3 x 3 and 5 x 5, kernel size using a stride of 1. Following the convolutional layer is the max pooling layer, dropout layer and a classification layer with soft max activation function.
The details of the three models are shown in Table 2. Model A consists of two convolutional layer with 16 and 32 feature maps, Model B consists of three convolutional layer with 16,32 and 64 feature maps and Model C consists of four convolutional layer with 16,32, 64 and 128 feature maps. In each of the models, a max pooling layer, two dropout layer with a probability of 0.25 and 0.5 and two dense layer is considered. The first linear layer has 128 output features and second linear layer has output features corresponding to number of classes which is 313 for main akashara and 30 for vattu akashara.
Output Shape and Parameters of Model A, Model B, Model C
Input to the model is a character image of size 28 x 28 x 1. The output of convolution layer is calculated using Equation (1). For kernel size of 3 x 3, padding = 0, with a stride of 1, the output shape of the filter is 26 x 26. In each model, the first convolution layer has 16 filters. The parameter required for first convolution layer is calculated using Equation (2). Therefore, parameters required in the first convolution layer are 160 for 3 x 3 filter and 416 for 5 x 5 filter. Similarly, parameters are calculated for second, third and fourth convolution layer using 3 x 3 and 5 x 5 filter and is tabulated in Table 2. Pooling layer has no learnable parameters and hence the number of parameters is zero. Fully connected layer has highest number of parameters as every neuron is connected to every other neuron. The number of parameters in this layer is calculated using Equation (3).
For example in Model A,with 3 x 3 filter and without padding, the parameter required for the first linear layer is ((128*(32*12*12))+1)*128)=589,952. The parameter for second Linear layer which has 30 neurons for vattu akashara is 3870 ((30*128)+1)*30) and 40377 ((313*128)+1)*313) for main akashara which has 313 neurons Similarly calculation is done for other models and different type of filters and is tabulated in Table 2. In Table 2, detailed calculation is shown for Model A and only the total parameters are depicted for Model B and Model C. From Table 2, it is clear that the total number of parameters for 5 x5 filter with padding is high compared to other filters in each model. This increases the computational time of the model. Each model is trained with training set discussed in Section 3 for both printed and handwritten Kannada characters and validated using validation set. The loss function used is categorical cross entropy loss and optimization algorithm used is Adam Optimizer.
Machine Learning algorithm has hand crafted features where as in deep learning, features are extracted using filters. To compare the deep learning model with machine learning technique, Hu’s invariant moments are extracted from the data set of main akasharas and vattu akasharas, both printed and handwritten. These features are classified using Support Vector Machine algorithm. Hu’s [10] introduced the use of moment invariants as features for pattern recognition.
The general form of a regular moment function m
pq
of the order (p + q) of an image intensity function f (x, y) is defined in Equation (4). For a digital image, the central moments, which are invariant to translation is defined in Equation (5), where
The SVM classifier [11] is a two class classifier based on the discriminant functions. A discriminate function represents a surface which separates the patterns as two classes. In this paper a number of two class classifiers are trained with each one distinguishing one class from the others. Each class label has an associated SVM and a test example is assigned the label of the class whose SVM gives the largest positive output. The example is rejected if no SVM gives a positive output.
Recognition accuracy of CNN model
The database of printed and handwritten Kannada characters has 31300 samples of main akasharas and 3000 samples of vattu akasharas. Among the total samples of main akasharas and vattu akasharas, 70% of the samples (21910) samples of main akasharas and 2100 samples of vattu akasharas are used for training each model. The model is validated using 10% of samples (3130) for main akasharas and (300) samples of vattu akasharas. The model is then tested using 20% of samples, (6260) of main akasharas and 600 samples of vattu akasharas.
The most important hyper parameter is the batch size, which is the number of training examples utilized in one iteration. Every dataset has different properties, some require smaller batch sizes while others require larger one. Batch size controls the accuracy of the estimate of the error gradient during training. Among the three batch modes, mini batch mode is selected where the batch size is greater than one but less than total dataset size. In this paper, the model is trained and validated with batch size of 8, 16, 32, 64, 128, 256 using Adam Optimizer with a learning rate of 0.001. The main idea is to find which batch size produces best result for the dataset.
The most common problem encountered while training the neural networks is overfitting. There are different techniques to prevent overfitting while training neural networks. In this paper, one of the technique such as Early Stopping is used.In this method, the performance of the model which is the validation loss is monitored for every epoch during training and the training condition is stopped when the validation loss starts increasing. The model is saved at this point and then the model is tested with the test set. Table 3 shows simulation result for batch size of 8, 16, 32, 64,128 and 256 for printed main akashara. In Table 3, recognition accuracy for the test set and the validation loss for which the model is saved is shown in the second column. From Table 3, it is clear that for a particular batch size, recognition accuracy of test set decreases and hence highest accuracy of 97.54% was obtained with a batch size of 32 for 3 x 3 filter with a padding of 1 for Model A and 98.72% for Model B and 98.83% for Model C.
Recognition Accuracy and validation loss of different models with different batch size for printed main akshra
Recognition Accuracy and validation loss of different models with different batch size for printed main akshra
Simulation was performed for datasets of printed vattu akashara, handwritten main and vattu akashara with different batch sizes and found that batch size of 32 in all the models gave highest test accuracy. Table 4 shows recognition accuracy for Model A, Model B and Model C for batch size of 32.
Recognition Accuracy of Model A, Model B and Model C with batch size of 32
From Table 4, it is clear that Model C with 3 x 3 filter and padding (p = 1), has highest accuracy of 98.83% for printed main akasharas and 82.50% accuracy for hand written main akasharas which has 313 classes. Also for vattu akasharas which has 30 classes, Model C with 3 x 3 filter with padding = 1, has highest accuracy of 99.29% for printed characters and 80.72% for handwritten characters.
Performance of Model C with 3 x 3 filter and with padding = 1 is analyzed with respect to batch size. Fig. 7(a) shows the effect of batch size on validation loss for different epoch. From Fig. 7(a), it is clear that with increase in batch size results in higher minimum validation loss. It also takes less time to train per epoch and more epochs to converge to the minimum validation loss. Thus a batch size of 32 gave best resultproving small batch training outperforms larger batch training with constant learning rate.

Performance of Model C with 3 x 3 filter and padding = 1: (a) Effect of batch size on validation loss. Training and validation loss for different epoch on (b) Printed Main character (c) Printed vattu akashara (d) Handwritten main akashara (e) Handwritten vattu akashara.
To diagnose the behavior of Model C with 3 x3 filter and padding = 1, the shape and dynamics of learning curve was studied. Plot of training loss and validation loss for the datasets of printed and handwritten main and vattu akashara is shown in Fig. 7(b) to (e). It is clear from these figures that training and validation loss decreases to a point of stability with a minimal gap between the two final loss values. Thus the model designed is a good representative of the dataset used for training and validation.
Model C with 3 x 3 filter and with a padding (p = 1) is the best model for recognition of printed and handwritten Kannadacharacters. This is because more features are extracted in this model and the visualization of feature maps for Model C is shown in Fig. 8(a). From Fig. 8(a), it is clear that almost all details of the character are extracted in the fourth layer having feature maps of 128. To show that fifth layer is not required for recognition,feature map of the fifth layer with filter size of 256 was simulated and was observed that most of the details of the pattern become invisible and hence only Model C with four layers was chosen to be sufficient for recognition. To compare the filtering of the layer with kernel size, visualization of feature maps for Model C with 5 x 5 filter with padding (p = 2) is shown in Fig. 8(b). From Fig. 8(b), it is clear that fourth layer with feature map of 128 are blurred when compared to Fig. 8(a) which shows sharp edges with feature map of 128. Hence simulation results show higher accuracy for Model C with 3 x 3 kernel size and padding (p = 1).

Visualization of Feature Maps with 128 Filters using (a) 3 x 3 Kernel size and Padding = 1(b) 5 x 5 kernel Size and Padding = .2
To compare deep learning algorithm with machine learning algorithm, Hu’s invariant moments are extracted from 80% samples of main akasharas (25,040) and vattu akasharas (2400) for printed and handwritten characters. The recognition accuracy is calculated for 20% of test samples of main akasharas (6260) and vattu akasharas (600) using SVM classifier with RBF kernel. Simulation was performed by varying the kernel scale σ from 0.5 to 2.5 in steps of 0.5. It was found that RBF kernel with kernel scale 2 gave relatively higher accuracy and the results are tabulated in Table 5. Higher value of kernel scale (greater than 2) had many misclassifications and this resulted in overfitting of data. Table 5 depicts the recognition accuracy of main akasharas and vattu akasharas with SVM classifier and CNN model with four convolutional layers described in Section 4. From Table 5, it is clear that CNN with four layers performs better than SVM algorithm.
Comparison of SVM classifier with CNN network for Printed and Handwritten Kannada Characters
Comparison of SVM classifier with CNN network for Printed and Handwritten Kannada Characters
The proposed model is compared with existing technique for printed and handwritten Kannada characters and the results are tabulated in Table 6. In [12], the authors have used 50 classes of handwritten kannada characters and have reported an accuracy of 86.92% with three layers in CNN and 93.06% with Pretrained VGG16 model using transfer learning. The accuracy of our model is 82.50% with 313 classes and 80.72% with 31 classes which is higher when compared to 50 classes. In [2], the authors have claimed 99% for 657 classes and 96 % for 50 classes with 2 layers of CNN. It is not necessary to train all the consonant with vattu characters, it can be trained separately ant then combine the base character and vattu character. Also the data set they have used are the segmented characters from natural scene, it is not hand written character, hence the accuracy is more compared to handwritten characters. In [14], the authors have worked only with 188 classes and have achieved an accuracy of 73.5% using VGG 19 with transfer learning. In this paper base characters and vattu characters are considered separately and trained separately with two different models consisting of 4 layers and achieved an accuracy of 82.50% and 80.72% respectively where as other authors have worked with only vowels and consonants. For printed Kannada characters, the authors [8] have worked with documents with minimum of 100 words and have used CNN with one layer and have four CNN for different size of the character based on aspect ratio and have an accuracy of 82.01% and in [13], the authors have claimed an accuracy of 99% for characters extracted from natural scene. In this paper, with four layers in CNN, the accuracy of 98.83% for 313 classes of main akasharas and 99.29 % for 30 classes of vattu akasharas is achieved.
Comparision of Existing Technique with Proposed Model for Handwritten and Printed KannadaCharacters
Comparision of Existing Technique with Proposed Model for Handwritten and Printed KannadaCharacters
The dataset of vattu akashara was avaialble in open source consisting of 34 classes and each class had 1000 samples, a total of 34000 samples. Sample of 4 classes of vattu akashara is shown in Fig. 9. The proposed models were trained with 70% of samples (23800) and validated with 10% of samples(3400) with batch size of 8,16,32,64,128,256 and 512. The model with minimum validation loss was saved and the saved models were tested with 20% samples (6800). Simulation result revealed that Model C with 3 x 3 filter with padding = 1 gave best result for batch size of 256. Table 7 shows recognition accuracy of Model C for different batch size. From Table 7, it is clear that with increase in batch size increases the test accuracy and gave maximum accuracy for batch size of 256 as there were more number of samples in each class and thus gave a better representation for this dataset

Sample of Vattu Akashara.
Recognition Accuracy of Model C for VattuAkashara Data set
Recognition of Kannada characters is a complicated task as the number of classes in Kannada language is vast compared to other languages. In this paper, to consider all the combinations of Kannada characters, only the base character is considered as main akasharas which consists of 313 classes and the conjunct characters written below the consonants are considered as vattu akasharas which is of 30 classes. Three different models with different layers, different feature maps and kernel size of 3 and 5 areconsidered and trained with main akasharas and vattu akasharas separately for different batch size. Simulation results revealed that batch size of 32 gave highest test accuracy in all the models with minimum validation loss. Thus a batch size of 32 gave best result for Model C with 3 x 3 filter and padding = 1, proving small batch training outperforms larger batch training with constant learning rate. The shape and dynamics of learning curve for this model was studied for the dataset and found to have a model with good fit. To know how each convolutional layer learn features, visualization of features was performed and found that all details of a character are extracted in the fourth layer having feature maps of 128. The performance of model was evaluated for dataset of vattu akashara available in open source and found better accuracy as the dataset had more samples per class. In future, more characters will be generated using augmentation, numerals and special characters will be included in the data set so that any scanned document either handwritten or printed document is segmented and recognized using the model proposed and converted into machine readable format. Also other segmentation method will be implemented to extract characters from scanned document and other deep neural network model such as attention mechanism and transformers will be applied on the dataset created.
