Recognition of Kannada characters using deep learning approach

Abstract

Recognition of Kannada Characters is a complex task as the number of classes in Kannada language by considering all combinations of vowels and consonants is 623,893. In this paper, the complexity is reduced from 623,893 to just having 313 classes as Main aksharas (Vowel, Consonants,Vowel modifiers and Consonant modifiers) and 30 classes as vattu aksharas(conjuncts) by using two line segmentation. A novel CNN model for recognition of printed and handwritten Kannada characters is proposed. CNN model with two, three and four layers are designed for Main akshara and Vattu aksharas with different filter size. The database consists of total of 31,300 samples and 3000 samples of printed and handwritten characters of Main akshara and Vattu aksharas respectively. Simulation result revealed that CNN model with four layer architecture is the best model for recognition of Kannada characters. This model achieved a recognition accuracy of 98.83% and 99.29% for printed Main akshara and Vattu aksharas and 82.50% and 80.92% for handwritten main and vattu akshara respectively.

Keywords

Deep learning convolution neural network SVM classifier horizontal projection profile vertical projection profile

1 Introduction

India is known for its multiculturalism through its multilinguals. Kannada, the official language of the south Indian state of Karnataka, is spoken by about 48 million people. The language is also spoken by linguistic minorities in the states of Maharashtra, AndhraPradesh, Tamil Nadu, Telangana, Kerala and Goa and also by Kannadigas abroad [1]. In India where a large number of people reside in rural areas and have a little knowledge of English language it is imperative to facilitate localized channels of communication. In addition to this a very large number of languages face the severity of extinction primarily due to neglect of native textual resources and the vast wisdom thereof hidden. In this era of English dominance native languages including Kannada face the problem of negligence towards its ancient and wisdom oriented textual material, this makes it necessary for the development of digital based support system, which would make the rich indigenous resource of the language universally accessible and hence lead to its higher standing in the global linguistic arena.

The only way of documentation available in government offices and healthcare departments in Karnataka state is through handwritten reports written in Kannada [2]. The contents of these old documents can be reproduced through typewriting, but it is a tedious task as these documents are difficult to read and understand. To store and maintain these documents is highly labor intensive operation. Thus in Government sectors and health departments a Handwritten text recognition system will be a boon to solve this practical issue where most of their documents will be in Kannada. Thus a computer-based system is necessary to overcome the gap between machines and humans. The challenging task in the computer vision community is the extraction of text data in a machine-readable format from real-world images. In most of the practical applications such as updating inventory, analyzing documents, scene understanding, robot navigation and image retrieval, reading of the text in natural images has gained a lot of attention.

Optical Character Recognition (OCR) [2, 3] is the mechanical or electronic conversion of scanned document with printed or handwritten text into machine encoded format has greatly improved the process of data management. This tool enables quick digitization of scanned documents and converts it into recognizable form. OCR eliminates the cost of misplaced or lost documents and offers higher savings in the form of reclaimed office space, which would otherwise be used for storing paper documents.

Traditional character recognition is based on hand crafted features and requires large amount of prior knowledge [4]. This necessitates large amount of pre-processing and to obtain good amount of recognition accuracy is really challenging. In the last few years’ development of deep neural network and their usage in text recognition has produced breakthrough performance. Still the amount of data is massive, variation in the datasets demands further research. Invention of Convolution Neural Network and its strength of perceiving structures of objects and extracting features make it suitable for automatic character recognition. In the present scenario when an enormous number of languages are on the verge of losing its valuable literary resources due to negligence by stake holders, Kannada is also facing a slight heat of the situation. The necessity of a reliable Kannada OCR system is clearly noticeable. With some research [5], it has been found that there are only a few Kannada OCR systems available on the market. These systems, however, aren’t reliable and accurate enough to provide the best results. As per our research, using Convolution Neural Networks, or CNN’s, provides the most accurate results. CNNs consist of convolutional layers or feature maps obtained by convolving images with sets of image filters followed by pooling and fully connected layers. Therefore, in character classification, feature maps are extracted through convolution with filters of size 3 x 3 and 5 x 5. In this paper, a CNN with two, three and four layer network with different filters and different channels is proposed for printed and handwritten Kannada characters.The main contributions in recognition of Kannada Characters using CNN is to visualize the region important for classification using feature visualization. It was found that CNN with four layers having feature maps of 16,32,64 and 128 could extract all details of character. The main contributions of this paper is as follows:

The complexity of Kannada characters is reduced from 623,893 classes to 313 classes by grouping the characters as Main akasharas and Vattu akasharas

CNN with two, three and four-layer network with different filters and different channels is proposed for printed and handwritten Kannada characters

The effect of batch size is studied and found that batch size of 32 gave best result with minimum validation loss

The effect of learning curves was analyzed and found the model to be good fit

Feature visualization helps to identify the number of feature extraction layers, thus simplifying CNN model with optimum number of parameters

Organization of the paper is as follows: Section 2 deals with related work Section 3 describes the dataset of printed and handwritten Kannada characters used to train and test the CNN model. The architecture of CNN model is presented in Section 4. Section 5 deals with the simulation results. Finally, conclusion and future work is presented in Section 6

2 Related work

Literature review reveals that many have been working in the development of novel algorithms for classification and recognition of character, word and numerals. Convolution Neural Network brings revolution on the pattern classification. Neural Network approach for identification of Indian script is first proposed by Patil and Subbareddy [6]. They proposed a multilingual, multiscript OCR which is designed for English, Kannada and Hindi script. A modular neural network is used for script identification with two stage feature extraction process. CNN with 13 layers have been applied on a multi dataset experiment [7] on Bangla language with 2-sub layers on four datasets and obtained an accuracy of 98%, 96.81%, 95.71%, and 96.40% respectively.

Pradymna and Niraj S Prasad [8] have discussed on “KanOCR: Conversion of printed Kannada documents to editable form using Convolution Neural Networks”. The characters are first extracted from the images using various segmentation methods. They are then fed as inputs to Convolution Neural Network (CNN) for recognition. The recognized characters are then translated into unicode and then printed. They have used dataset from Kanscan googlepay store link and achieved accuracy of 80%.

Ramesh G, et al. [9] proposed “Offline Kannada Handwritten Character Recognition using Convolution Neural Networks” model and the algorithm has been tested on their own dataset and obtained accuracy of 93.2% and 78.73% on consolidated and raw datasets respectively. The model discussed in all the papers used a single convolution layer with 5 x 5 filter with 64 channels followed by Relu and max-pool layer.In this paper,CNN model with two,three and four layer with different feature maps is discussed.

3 Dataset of printed and handwritten Kannada characters

Kannada alphabet has 49 or 50 or 51 letters (varnamaale) depending on the rules they follow. There are 13 independent vowels (V) known as swaras and 2 yogavaha, 36 consonants(C), 34 consonants appear in modern kannada text and other two consonants appear in ancient text and are not presently used. The two yogavaha are not used independently and they follow a vowel or a consonant. Consonants are modified by 13 independent vowels and these are called Akashara (syllable) forming aconsonant-vowel (CV) structure creating a total of 468 characters. For each consonant, there are conjuncts known as vattu akashara. The conjuncts appear with an akashara to form complex letter(C(CV)) accounting a total of 16848(36 x 468) characters. Optionally, an akshara can have one or more conjuncts in a C(CV) combination forming a canonical structure of ((C)C) CV accounting a total of 606,528 characters. Thus, the number of theoretically possible combinations of Kannada characters is 623,893 and is listed in Table 1.

Table 1
Combination of Kannada characters

Character Type V C CV CCV CCCV Total

Possible Combinations 13 36 468 16848 606528 623893

Example

Character Type	V	C	CV	CCV	CCCV	Total
Possible Combinations	13	36	468	16848	606528	623893
Example

To reduce the complexity of characters, machine learning approach has used three zone segmentation and the characters in the upper zone, middle zone and lower zone are recognized separately and then they are combined to form an akashara(syllable). In deep learning approach all vowels, consonants and consonants with vowel modifiers are considered as separate group known as Main akashara and all consonant conjuncts are taken as another group known as vattu akashara.

Main Akashara consists of

13 vowels and 36 consonants (a total of 49) as shown in Fig. 1(a).

Live consonant which retains its inherent vowel or is written with explicit dependent vowel (a total of 34) as in Fig. 1(b)

In consonant vowel combination, there are 13 CV combination for each consonant, but 7 combinations for each consonant such as are considered while the remaining 5 combinations such as () is not considered as single syllable but they are considered as two syllableand then appended in the final stage. For example for the letter which is available in CV combination is appended with , and hence the total number of CV combination is 224(32 x 7), excluding the four consonants as in Fig. 1(c)

Special characters a total of 6, segmented in CV combination

Fig. 1

Kannada Main Akasharas (a) Vowels and Consonants (b) Live Consonants (c) Consonant – Vowel Combination (d) Special Characters.

Therefore, total classes in Main akashara is 313(49 + 34 + 224 + 6). Hence the complexity is reduced from 623,893 to 313 classes for the Main aksharas.

Vattu akashara consists of consonant conjuncts for each consonant and there are 34 conjuncts for each consonants as shown in Fig. 2. In this paper only 30 classes are considered, the conjuncts such as which do not occur frequently in the text are not considered.

Fig. 2

Kannada Vattuakshara (Consonant Conjuncts).

3.1 Data collection for printed kannada characters

Printed Kannada text is scanned by a flat bed scanner at 300dpi. Lines are segmented using Horizontal Projection Profile(HPP) and words in each line are segmented using Vertical Projection Profile (VPP).The characters in each word are extracted by using two stage segmentation algorithm. In printed Kannada text, for words without bottom extension characters (vattu), the space between two adjacent characters have valleys in the VPP and for words having bottom extension characters (vattu), the space between two adjacent characters do not have valleys in the VPP, which makes it difficult to extract individual characters from the word. In two stage method of segmentation, vattu (which are usually few in number) are segmented from the word using connected component processing in the first stage and the remaining characters from the word(with vattu removed) are easily segmented using traditional VPP method. Two stage segmentation algorithms separate the individual characters as main and vattu aksharas from the words. Figure 3 shows a sample Kannada text and its HPP. Words are segmented from each line using VPP and segmented word from second line is shown in Fig. 4(a). Main akasharas and vattu akasharas from a word are segmented using VPP and connected component analysis. Segmented main akashara and vattu akashara for segmented word is shown in Fig. 4(b) and (c) respectively.

Fig. 3

Sample Printed Kannada text and its horizontal projection profile.

Fig. 4

Two Stage Segmentation of a word: (a) Sample word (b) Segmented Main Akashara (c)Segmented Vattu akashara.

3.1 Data collection for handwritten kannada characters

A total of 50 Kannada text samples were collected from 50 different writers. Writers from age of 25 years to age of 55 years were asked to write only main akasharas and vattu akashara in a line containing at least 10 samples of each. This handwritten document is digitized at 300dpi by a flat bed scanner. Sample of handwritten document with main akashara as in Fig. 5(a)–(c) and vattu akashara is shown in Fig. 5(d).

Fig. 5

Sample Handwriiten Document: (a) Vowels (b) Consonants (c) Consonants with Vowel Modifier and (d) Vattu Akashara.

Main akasharas and vattu akasharas are extracted by applying two stage segmentation algorithm. There are 313 classes of main akashara and 30 classes of vattu akashara, 100 samples of each class are extracted from printed text and handwritten text as explained in the above section. Thus the database of printed and handwritten Kannada characters has 31300 main characters corresponding to 313 classes and each category has an average of 100 sample and 3000 vattu characters corresponding to 30 classes and each category has an average of 100 samples. Among the total samples,70% are used for training, 10% for validation and 20% for testing the model.

4 Architecture of CNN model

4.1 Proposed model

Convolutional Neural Network (CNN) is a deep learning algorithm which takes an input image assigns learnable weights and biases to various objects in an image and thus differentiates an image from the other. In traditional methods filters are hand crafted but ConvNets have the ability to learn the filter characteristics. CNN model consists of an input layer, convolution layer with k filters, pooling layer, fully connected layer and a classification layer.Three CNN models, Model A, Model B, Model C are designed based on number of convolution layers to recognize printed and handwritten Kannada characters and is shown in Fig. 6.

Fig. 6

Architecture of CNN Model: Model A, Model B and Model C.

Model A consists of two convolutional layer, Model B consists of three convolutional layer and Model C consists of four convolutional layer. In these models, Relu activation function is considered after each convolutional layer. In each of the model, convolution is performed by using filters of 3 x 3 and 5 x 5, kernel size using a stride of 1. Following the convolutional layer is the max pooling layer, dropout layer and a classification layer with soft max activation function.

The details of the three models are shown in Table 2. Model A consists of two convolutional layer with 16 and 32 feature maps, Model B consists of three convolutional layer with 16,32 and 64 feature maps and Model C consists of four convolutional layer with 16,32, 64 and 128 feature maps. In each of the models, a max pooling layer, two dropout layer with a probability of 0.25 and 0.5 and two dense layer is considered. The first linear layer has 128 output features and second linear layer has output features corresponding to number of classes which is 313 for main akashara and 30 for vattu akashara.

Table 2

Output Shape and Parameters of Model A, Model B, Model C

Model – A
Layer	3 x 3		5 x 5		3 x 3	3 x 3	5 x 5
	(Padding = 0)		5 x 5 (Padding = 0)		(Padding = 1)	(Padding = 1)	(Padding = 2)
					5 x 5
					(Padding = 2)
	Output Shape	Parameters	Output Shape	Parameters	Output Shape	Parameters	Parameters
Conv2d-1	[16,26,26]	160	[16,24,24]	416	[16,28,28]	160	416
Conv2d-2	[32,24,24]	4640	[32,20,20]	12,832	[32,28,28]	4640	12832
MaxPool2d	[32,12,12]	0	[32,10,10]	0	[32,14,14]	0	0
Linear	[128]	5,89,952	[128]	4,09,728	[128]	8,02,944	8,02,944
Linear(30)	[30]	3870	[30]	3870	[30]	3870	3870
Linear(313)	[313]	40,377	[313]	40,377	[313]	40,377	40377
Total parameters (30 classes)		5,98,622		4,26,846		8,11,614	8,20,062
Total Parameters (313 Classes)		6,35,129		4,63,353		8,48,121	8,56,569
Model – B
Total parameters (30 Classes)		1,01,8526		5,92,798		1,63,2926	1,67,4142
Total Parameters (313 Classes)		1,05,5033		6,29,305		1,66,9433	1,71,0649
Model – C
Total parameters (30 Classes)		1,73,9550		8,63,262		3,31,2414	3,48,4702
Total Parameters (313 Classes)		1,77,6057		8,99,769		3,34,8921	3,52,1209

Input to the model is a character image of size 28 x 28 x 1. The output of convolution layer is calculated using Equation (1). For kernel size of 3 x 3, padding = 0, with a stride of 1, the output shape of the filter is 26 x 26. In each model, the first convolution layer has 16 filters. The parameter required for first convolution layer is calculated using Equation (2). Therefore, parameters required in the first convolution layer are 160 for 3 x 3 filter and 416 for 5 x 5 filter. Similarly, parameters are calculated for second, third and fourth convolution layer using 3 x 3 and 5 x 5 filter and is tabulated in Table 2. Pooling layer has no learnable parameters and hence the number of parameters is zero. Fully connected layer has highest number of parameters as every neuron is connected to every other neuron. The number of parameters in this layer is calculated using Equation (3). $output = \frac{input - kernal size + 2 * padding}{stride} + 1$ (1) $\begin{matrix} Number of Parameters in CONV layer \\ = ((shape of width of filter * shape of heighth of filter \\ * number of filters in the previous layer) +) \\ * Number of filters in the current layer \end{matrix}$ (2) $\begin{matrix} Number of parameters in FC Layer \\ = ((Current layer Neurons (C) \\ * Pr evious Layer Neurons (P)) + 1 \\ * Current Layer Neurons (C) \end{matrix}$ (3)

For example in Model A,with 3 x 3 filter and without padding, the parameter required for the first linear layer is ((128*(32*12*12))+1)*128)=589,952. The parameter for second Linear layer which has 30 neurons for vattu akashara is 3870 ((30*128)+1)*30) and 40377 ((313*128)+1)*313) for main akashara which has 313 neurons Similarly calculation is done for other models and different type of filters and is tabulated in Table 2. In Table 2, detailed calculation is shown for Model A and only the total parameters are depicted for Model B and Model C. From Table 2, it is clear that the total number of parameters for 5 x5 filter with padding is high compared to other filters in each model. This increases the computational time of the model. Each model is trained with training set discussed in Section 3 for both printed and handwritten Kannada characters and validated using validation set. The loss function used is categorical cross entropy loss and optimization algorithm used is Adam Optimizer.

4.2 Machine learning algorithm using SVM

Machine Learning algorithm has hand crafted features where as in deep learning, features are extracted using filters. To compare the deep learning model with machine learning technique, Hu’s invariant moments are extracted from the data set of main akasharas and vattu akasharas, both printed and handwritten. These features are classified using Support Vector Machine algorithm. Hu’s [10] introduced the use of moment invariants as features for pattern recognition.

The general form of a regular moment function m_pq of the order (p + q) of an image intensity function f (x, y) is defined in Equation (4). For a digital image, the central moments, which are invariant to translation is defined in Equation (5), where $x_{c} = \frac{m_{10}}{m_{00}}$ and $y_{c} = \frac{m_{01}}{m_{00}}$ are the co -ordinates of the centroid. The seven nonlinear functions [10] defined on regular moments which are invariant to rotation, scaling and translation are calculated for segmented character discussed in Section 3 and considered as feature vector for recognition using SVM. $m_{pq} = \iint x^{p} y^{q} f (x, y) dx dy$ (4) $μ_{pq} = \sum_{x} \sum_{y} {(x - x_{c})}^{p} {(y - y_{c})}^{q}$ (5)

The SVM classifier [11] is a two class classifier based on the discriminant functions. A discriminate function represents a surface which separates the patterns as two classes. In this paper a number of two class classifiers are trained with each one distinguishing one class from the others. Each class label has an associated SVM and a test example is assigned the label of the class whose SVM gives the largest positive output. The example is rejected if no SVM gives a positive output.

5 Simulation results

5.1 Recognition accuracy of CNN model

The database of printed and handwritten Kannada characters has 31300 samples of main akasharas and 3000 samples of vattu akasharas. Among the total samples of main akasharas and vattu akasharas, 70% of the samples (21910) samples of main akasharas and 2100 samples of vattu akasharas are used for training each model. The model is validated using 10% of samples (3130) for main akasharas and (300) samples of vattu akasharas. The model is then tested using 20% of samples, (6260) of main akasharas and 600 samples of vattu akasharas.

The most important hyper parameter is the batch size, which is the number of training examples utilized in one iteration. Every dataset has different properties, some require smaller batch sizes while others require larger one. Batch size controls the accuracy of the estimate of the error gradient during training. Among the three batch modes, mini batch mode is selected where the batch size is greater than one but less than total dataset size. In this paper, the model is trained and validated with batch size of 8, 16, 32, 64, 128, 256 using Adam Optimizer with a learning rate of 0.001. The main idea is to find which batch size produces best result for the dataset.

The most common problem encountered while training the neural networks is overfitting. There are different techniques to prevent overfitting while training neural networks. In this paper, one of the technique such as Early Stopping is used.In this method, the performance of the model which is the validation loss is monitored for every epoch during training and the training condition is stopped when the validation loss starts increasing. The model is saved at this point and then the model is tested with the test set. Table 3 shows simulation result for batch size of 8, 16, 32, 64,128 and 256 for printed main akashara. In Table 3, recognition accuracy for the test set and the validation loss for which the model is saved is shown in the second column. From Table 3, it is clear that for a particular batch size, recognition accuracy of test set decreases and hence highest accuracy of 97.54% was obtained with a batch size of 32 for 3 x 3 filter with a padding of 1 for Model A and 98.72% for Model B and 98.83% for Model C.

Table 3
Recognition Accuracy and validation loss of different models with different batch size for printed main akshra

Model A

Batch Size 3x3 3x3 5x5 5x5

Padding = 0 Padding = 1 Padding = 0 Padding = 2

Test Accuracy Validation Test Validation Test Validation Test Validation

(%) Loss Accuracy Loss Accuracy Loss Accuracy Loss

(%) (%) (%) (%)

8 93.20 0.1912 94.86 0.1539 91.60 0.4514 93.94 0.2289

16 96.38 0.1868 95.57 0.1475 93.40 0.2863 93.80 0.2867

32 96.81 0.1669 97.54 0.1074 94.50 0.2592 97.04 0.2218

64 95.60 0.2518 95.71 0.2515 94.32 0.2793 95.25 0.2551

128 95.97 0.2091 95.81 0.1371 94.47 0.3131 96.57 0.2070

256 95.76 0.2541 95.94 0.1125 91.26 0.4477 95.97 0.2942

Model B

Batch Size 3x3 3x3 5x5 5x5

Padding = 0 Padding = 1 Padding = 0 Padding = 2

Test Accuracy Validation Test Validation Test Validation Test Validation

(%) Loss Accuracy Loss Accuracy Loss Accuracy Loss

(%) (%) (%) (%)

8 95.75 0.1403 98.16 0.1198 89.90 0.4175 94.15 0.1945

16 98.14 0.0703 97.80 0.0644 93.60 0.2392 98.37 0.1196

32 98.55 0.0350 98.72 0.0536 98.45 0.1345 98.61 0.0504

64 98.04 0.8245 98.43 0.1492 97.23 0.2034 98.31 0.0606

128 97.35 0.7750 97.65 0.1532 96.35 0.2893 98.82 0.0288

256 97.22 0.6619 97.53 0.1687 95.18 0.2915 98.56 0.0401

Model C

Batch Size 3x3 3x3 5x5 5x5

Padding = 0 Padding = 1 Padding = 0 Padding = 2

Test Accuracy Validation Test Validation Test Validation Test Validation

(%) Loss Accuracy Loss Accuracy Loss Accuracy Loss

(%) (%) (%) (%)

8 94.63 0.2456 95.43 0.2019 87.15 0.5474 94.22 0.2664

16 96.10 0.2181 96.78 0.2068 89.46 0.3800 94.66 0.2564

32 98.46 0.1668 98.83 0.0194 94.21 0.1922 98.01 0.0826

64 96.42 0.1704 97.91 0.1497 95.52 0.1372 97.91 0.1534

128 96.29 0.1795 96.99 0.1552 97.26 0.1373 97.68 0.1571

256 95.59 0.1892 97.52 0.1678 97.13 0.0733 97.62 0.1742

Model A
8	93.20	0.1912	94.86	0.1539	91.60	0.4514	93.94	0.2289
16	96.38	0.1868	95.57	0.1475	93.40	0.2863	93.80	0.2867
32	96.81	0.1669	97.54	0.1074	94.50	0.2592	97.04	0.2218
64	95.60	0.2518	95.71	0.2515	94.32	0.2793	95.25	0.2551
128	95.97	0.2091	95.81	0.1371	94.47	0.3131	96.57	0.2070
256	95.76	0.2541	95.94	0.1125	91.26	0.4477	95.97	0.2942
Model B
Batch Size	3x3	3x3	5x5	5x5
	Padding = 0	Padding = 1	Padding = 0	Padding = 2
	Test Accuracy	Validation	Test	Validation	Test	Validation	Test	Validation
	(%)	Loss	Accuracy	Loss	Accuracy	Loss	Accuracy	Loss
	(%)		(%)		(%)		(%)
8	95.75	0.1403	98.16	0.1198	89.90	0.4175	94.15	0.1945
16	98.14	0.0703	97.80	0.0644	93.60	0.2392	98.37	0.1196
32	98.55	0.0350	98.72	0.0536	98.45	0.1345	98.61	0.0504
64	98.04	0.8245	98.43	0.1492	97.23	0.2034	98.31	0.0606
128	97.35	0.7750	97.65	0.1532	96.35	0.2893	98.82	0.0288
256	97.22	0.6619	97.53	0.1687	95.18	0.2915	98.56	0.0401
Model C
Batch Size	3x3	3x3	5x5	5x5
	Padding = 0	Padding = 1	Padding = 0	Padding = 2
	Test Accuracy	Validation	Test	Validation	Test	Validation	Test	Validation
	(%)	Loss	Accuracy	Loss	Accuracy	Loss	Accuracy	Loss
	(%)		(%)		(%)		(%)
8	94.63	0.2456	95.43	0.2019	87.15	0.5474	94.22	0.2664
16	96.10	0.2181	96.78	0.2068	89.46	0.3800	94.66	0.2564
32	98.46	0.1668	98.83	0.0194	94.21	0.1922	98.01	0.0826
64	96.42	0.1704	97.91	0.1497	95.52	0.1372	97.91	0.1534
128	96.29	0.1795	96.99	0.1552	97.26	0.1373	97.68	0.1571
256	95.59	0.1892	97.52	0.1678	97.13	0.0733	97.62	0.1742

Simulation was performed for datasets of printed vattu akashara, handwritten main and vattu akashara with different batch sizes and found that batch size of 32 in all the models gave highest test accuracy. Table 4 shows recognition accuracy for Model A, Model B and Model C for batch size of 32.

Table 4

Recognition Accuracy of Model A, Model B and Model C with batch size of 32

Recognition Accuracy (%)
Main Akasharas (Printed)					Vattu Akashara (Printed)
Model/	3 x 3		5 x 5		3 x 3		5 x 5
Filter	P = 0	P=1	P=0	P=2	P=0	P=1	P=0	P=2
Model – A	96.81	97.54	94.50	97.04	96.87	98.95	97.91	98.40
Model – B	98.85	98.72	98.45	98.61	98.28	99.03	98.07	98.43
Model – C	98.46	98.83	94.21	98.01	98.45	99.29	98.27	98.89
Main Akasharas (Hand written)					Vattu Akashara (Hand written)
	3 x 3		5 x 5		3 x 3		5 x 5
	P=0	P=1	P=0	P=2	P=0	P=1	P=0	P=2
Model – A	70.53	71.09	68.03	74.21	68.72	70.54	69.56	74.20
Model – B	72.41	78.12	71.42	76.33	70.22	76.54	70.83	74.34
Model – C	75.50	82.50	75.89	81.09	71.87	80.72	71.87	78.64

From Table 4, it is clear that Model C with 3 x 3 filter and padding (p = 1), has highest accuracy of 98.83% for printed main akasharas and 82.50% accuracy for hand written main akasharas which has 313 classes. Also for vattu akasharas which has 30 classes, Model C with 3 x 3 filter with padding = 1, has highest accuracy of 99.29% for printed characters and 80.72% for handwritten characters.

5.2 Analysis of model C with 3 x 3 filter with padding

Performance of Model C with 3 x 3 filter and with padding = 1 is analyzed with respect to batch size. Fig. 7(a) shows the effect of batch size on validation loss for different epoch. From Fig. 7(a), it is clear that with increase in batch size results in higher minimum validation loss. It also takes less time to train per epoch and more epochs to converge to the minimum validation loss. Thus a batch size of 32 gave best resultproving small batch training outperforms larger batch training with constant learning rate.

Fig. 7

Performance of Model C with 3 x 3 filter and padding = 1: (a) Effect of batch size on validation loss. Training and validation loss for different epoch on (b) Printed Main character (c) Printed vattu akashara (d) Handwritten main akashara (e) Handwritten vattu akashara.

To diagnose the behavior of Model C with 3 x3 filter and padding = 1, the shape and dynamics of learning curve was studied. Plot of training loss and validation loss for the datasets of printed and handwritten main and vattu akashara is shown in Fig. 7(b) to (e). It is clear from these figures that training and validation loss decreases to a point of stability with a minimal gap between the two final loss values. Thus the model designed is a good representative of the dataset used for training and validation.

5.3 Visualization of feature maps

Model C with 3 x 3 filter and with a padding (p = 1) is the best model for recognition of printed and handwritten Kannadacharacters. This is because more features are extracted in this model and the visualization of feature maps for Model C is shown in Fig. 8(a). From Fig. 8(a), it is clear that almost all details of the character are extracted in the fourth layer having feature maps of 128. To show that fifth layer is not required for recognition,feature map of the fifth layer with filter size of 256 was simulated and was observed that most of the details of the pattern become invisible and hence only Model C with four layers was chosen to be sufficient for recognition. To compare the filtering of the layer with kernel size, visualization of feature maps for Model C with 5 x 5 filter with padding (p = 2) is shown in Fig. 8(b). From Fig. 8(b), it is clear that fourth layer with feature map of 128 are blurred when compared to Fig. 8(a) which shows sharp edges with feature map of 128. Hence simulation results show higher accuracy for Model C with 3 x 3 kernel size and padding (p = 1).

Fig. 8

Visualization of Feature Maps with 128 Filters using (a) 3 x 3 Kernel size and Padding = 1(b) 5 x 5 kernel Size and Padding = .2

5.4 Comparison of deep learning algorithm with machine learning algorithm

To compare deep learning algorithm with machine learning algorithm, Hu’s invariant moments are extracted from 80% samples of main akasharas (25,040) and vattu akasharas (2400) for printed and handwritten characters. The recognition accuracy is calculated for 20% of test samples of main akasharas (6260) and vattu akasharas (600) using SVM classifier with RBF kernel. Simulation was performed by varying the kernel scale σ from 0.5 to 2.5 in steps of 0.5. It was found that RBF kernel with kernel scale 2 gave relatively higher accuracy and the results are tabulated in Table 5. Higher value of kernel scale (greater than 2) had many misclassifications and this resulted in overfitting of data. Table 5 depicts the recognition accuracy of main akasharas and vattu akasharas with SVM classifier and CNN model with four convolutional layers described in Section 4. From Table 5, it is clear that CNN with four layers performs better than SVM algorithm.

Table 5
Comparison of SVM classifier with CNN network for Printed and Handwritten Kannada Characters

Recognition Accuracy (%)

Classifier Main Akasharas Vattu Akasharas

Printed Handwritten Printed Handwritten

SVM(RBF, σ=2) 86.11 71.12 88.23 72.82

CNN(Model – C, 3 x 3 Filter with padding = 1) 98.83 82.50 99.29 80.72

Recognition Accuracy (%)
SVM(RBF, σ=2)	86.11	71.12	88.23	72.82
CNN(Model – C, 3 x 3 Filter with padding = 1)	98.83	82.50	99.29	80.72

5.5 Comparison of proposed technique with existing technique

The proposed model is compared with existing technique for printed and handwritten Kannada characters and the results are tabulated in Table 6. In [12], the authors have used 50 classes of handwritten kannada characters and have reported an accuracy of 86.92% with three layers in CNN and 93.06% with Pretrained VGG16 model using transfer learning. The accuracy of our model is 82.50% with 313 classes and 80.72% with 31 classes which is higher when compared to 50 classes. In [2], the authors have claimed 99% for 657 classes and 96 % for 50 classes with 2 layers of CNN. It is not necessary to train all the consonant with vattu characters, it can be trained separately ant then combine the base character and vattu character. Also the data set they have used are the segmented characters from natural scene, it is not hand written character, hence the accuracy is more compared to handwritten characters. In [14], the authors have worked only with 188 classes and have achieved an accuracy of 73.5% using VGG 19 with transfer learning. In this paper base characters and vattu characters are considered separately and trained separately with two different models consisting of 4 layers and achieved an accuracy of 82.50% and 80.72% respectively where as other authors have worked with only vowels and consonants. For printed Kannada characters, the authors [8] have worked with documents with minimum of 100 words and have used CNN with one layer and have four CNN for different size of the character based on aspect ratio and have an accuracy of 82.01% and in [13], the authors have claimed an accuracy of 99% for characters extracted from natural scene. In this paper, with four layers in CNN, the accuracy of 98.83% for 313 classes of main akasharas and 99.29 % for 30 classes of vattu akasharas is achieved.

Table 6
Comparision of Existing Technique with Proposed Model for Handwritten and Printed KannadaCharacters

Name of the author Data Sets Model Total Parameters Accuracy(%) Loss

HandWritten Data set

Parikshith [12] Vowels(16), Consonants(34) (Handwritten) (Total 50 Classes) CNN (3 Layers) 11,814,066 86.92 0.4492

VGG16(Transfer learning) 138,423,208 93.06 0.1591

Asha. K [13] Char74K (657 Classes) CNN (2 Layers) – 99 –

Vowels(16) Consonants(34) 96

N. Shobha Rani [14] 188 classes of handwritten Kannada Characters VGG 19(Transfer Learning) 73.5 0.1601

Proposed 313 classes of main Akasharas CNN (4 Layer), (16,32,64,128 Features), Kernel Size (3 x 3) Padding = 1,stride=1 3,34,8921 82.50 0.5104

30 classes of Vattu Akasharas 3,31,2414 80.72 0.9964

Printed Data set

Pradyumna Mukunda [8] 33 samples of Printed Kannada Documents with a minimum of 100 words, Samples consists of Main Akasharas and Vattu Akasharas CNN: 1 convolution Layer with 5 x 5 filter with 64 channels. Four CNN of input size 15 x 20, 25 x 20,30 x 20, 40 x 20, based on aspect ratio of character 82.01 –

Proposed Printed Main Akasharas (313) and Vattu Akasharas(30) CNN (4 Layer), (16,32,64,128 Features), Kernel Size (3 x 3) Padding = 1,stride=1 3,34,8921 99.83 0.0194

3,31,2414 99.29 0.0784

Name of the author	Data Sets	Model	Total Parameters	Accuracy(%)	Loss
Parikshith [12]	Vowels(16), Consonants(34) (Handwritten) (Total 50 Classes)	CNN (3 Layers)	11,814,066	86.92	0.4492
		VGG16(Transfer learning)	138,423,208	93.06	0.1591
Asha. K [13]	Char74K (657 Classes)	CNN (2 Layers)	–	99	–
	Vowels(16) Consonants(34)			96
N. Shobha Rani [14]	188 classes of handwritten Kannada Characters	VGG 19(Transfer Learning)		73.5	0.1601
Proposed	313 classes of main Akasharas	CNN (4 Layer), (16,32,64,128 Features), Kernel Size (3 x 3) Padding = 1,stride=1	3,34,8921	82.50	0.5104
	30 classes of Vattu Akasharas	3,31,2414	80.72	0.9964
Printed Data set
Pradyumna Mukunda [8]	33 samples of Printed Kannada Documents with a minimum of 100 words, Samples consists of Main Akasharas and Vattu Akasharas	CNN: 1 convolution Layer with 5 x 5 filter with 64 channels. Four CNN of input size 15 x 20, 25 x 20,30 x 20, 40 x 20, based on aspect ratio of character		82.01	–
Proposed	Printed Main Akasharas (313) and Vattu Akasharas(30)	CNN (4 Layer), (16,32,64,128 Features), Kernel Size (3 x 3) Padding = 1,stride=1	3,34,8921	99.83	0.0194
			3,31,2414	99.29	0.0784

5.6 Performance of the model with other dataset

The dataset of vattu akashara was avaialble in open source consisting of 34 classes and each class had 1000 samples, a total of 34000 samples. Sample of 4 classes of vattu akashara is shown in Fig. 9. The proposed models were trained with 70% of samples (23800) and validated with 10% of samples(3400) with batch size of 8,16,32,64,128,256 and 512. The model with minimum validation loss was saved and the saved models were tested with 20% samples (6800). Simulation result revealed that Model C with 3 x 3 filter with padding = 1 gave best result for batch size of 256. Table 7 shows recognition accuracy of Model C for different batch size. From Table 7, it is clear that with increase in batch size increases the test accuracy and gave maximum accuracy for batch size of 256 as there were more number of samples in each class and thus gave a better representation for this dataset

Fig. 9

Sample of Vattu Akashara.

Table 7

Recognition Accuracy of Model C for VattuAkashara Data set

Model C
Batch Size	3x3		3x3		5x5		5x5
	Padding = 0		Padding = 1		Padding = 0		Padding = 2
	Test	Validation	Test	Validation	Test	Validation	Test	Validation
	Accuracy	Loss	Accuracy	Loss	Accuracy	Loss	Accuracy	Loss
	(%)		(%)		(%)		(%)
8	91.83	0.2706	93.15	0.2265	76.47	0.7676	89.63	0.3958
16	94.12	0.1844	93.75	0.1835	85.73	0.4783	91.00	0.3255
32	94.82	0.1871	94.99	0.1498	88.10	0.3950	92.82	0.2069
64	95.25	0.1371	95.65	0.1242	92.67	0.2548	93.89	0.1923
128	96.35	0.1279	96.25	0.1198	94.66	0.1398	96.80	0.1135
256	96.71	0.1123	97.20	0.1151	95.73	0.1289	97.11	0.1035
512	96.30	0.1228	96.19	0.1144	94.32	0.1676	96.26	0.1240

6 Conclusion and future work

Recognition of Kannada characters is a complicated task as the number of classes in Kannada language is vast compared to other languages. In this paper, to consider all the combinations of Kannada characters, only the base character is considered as main akasharas which consists of 313 classes and the conjunct characters written below the consonants are considered as vattu akasharas which is of 30 classes. Three different models with different layers, different feature maps and kernel size of 3 and 5 areconsidered and trained with main akasharas and vattu akasharas separately for different batch size. Simulation results revealed that batch size of 32 gave highest test accuracy in all the models with minimum validation loss. Thus a batch size of 32 gave best result for Model C with 3 x 3 filter and padding = 1, proving small batch training outperforms larger batch training with constant learning rate. The shape and dynamics of learning curve for this model was studied for the dataset and found to have a model with good fit. To know how each convolutional layer learn features, visualization of features was performed and found that all details of a character are extracted in the fourth layer having feature maps of 128. The performance of model was evaluated for dataset of vattu akashara available in open source and found better accuracy as the dataset had more samples per class. In future, more characters will be generated using augmentation, numerals and special characters will be included in the data set so that any scanned document either handwritten or printed document is segmented and recognized using the model proposed and converted into machine readable format. Also other segmentation method will be implemented to extract characters from scanned document and other deep neural network model such as attention mechanism and transformers will be applied on the dataset created.

References

Naveena

and ManjunathAradhya

V.N.

, Handwritten character segmentation for Kannada scripts, World Congress on Information and Communication Technologies, IEEE, 2012.

SavitaAhlawat, AmitChoudhary, Anand Nayyar, SaurabhSingh and ByungunYoon, Improved handwritten digit recognition using convolutional neural networks (CNN), Sensors 20(12) (2020), 3344.

Chaiathra

and Indira

, Handwritten Online Character Recognition for Single Stroke Kannada Characters, 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology, 2017.

Netravati Belagali and Shanmukhappa A. Angadi, OCR for Handwritten Kannada Language Script, International Journal of Recent Trends in Engineering and Research, ISSN(online): 2455 (2016), 1457.

Abhishek S. Rao, S. Sandhya, K. Anusha, Arpitha, ChandanaNayak and Meghana, SnehaNayak, Exploring Deep Learning Techniques for Kannada Handwritten Character Recognition: A Boon for Digitization, International Journal of Advanced Science and Technology 29(5) (2020), 11078–11093.

Patil

S.B.

and Subbareddy

N.V.

, Neural network based system for script identification in Indian documents, Sadhana 27(1) (2002), 83–97.

Akam Shahariar Azad Rabby, Sadeka Haque, Sanzidul Islam, Sheik Abjur and Syed Akhter Hossain, BornoNet: Bangla handwritten characters recognition using convolutional neural network, Procedia Comput. Sci. 143 (2018), 528–535.

PradymnaMukunda, Niraj S Prasad, D.M. Santhosh, H.R. Dr. Mamatha, KanOCR: Conversion of Printed Kannada Documents to Editable Form using Convolutional Neural Networks, International Journal of Computer Applications 177(37) (2020), 51–58.

Ramesh

, Ganesh N Sharma, J. Manoj Balaji, H.N. Champa, Offline Kannada Handwritten Character Recognition using Convolutional Neural Networks, 5th IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE), 2019.

10.

M.K.

, Visual Pattern recognition by moment invariants, IRE Trans. Information Theory IT-8 (1962), 179–187.

11.

Burges

C.J.C.

, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery 2 (1998), 121–167.

12.

Parikshith

, Naga Rajath

S.M.

, Shwetha

, Sindhu

C.M.

and Ravi

, Handwritten character recognition of kannada language using convolutional neural networks and transfer learning, IOP Conf. Ser.Mater. Sci. Eng (2021).

13.

Asha

and Krishnappa

H.K.

, Kannada Handwritten Document Recognition using Convolutional Neural Network”, 3rd IEEE International Conference on Computational Systems and Information Technology for Sustainable Solutions, 2018.

14.

Shobha Rani

, Subramani

A.C.

, Akshay Kumar

and Pushpa

B.R.

, Deep Learning Network Architecture based Kannada Handwritten Character Recognition, Proceedings of the Second International Conference on Inventive Research in Computing Applications (ICIRCA-2020).

Recognition Accuracy (%)
Classifier	Main Akasharas		Vattu Akasharas
Printed	Handwritten	Printed	Handwritten
SVM(RBF, σ=2)	86.11	71.12	88.23	72.82
CNN(Model – C, 3 x 3 Filter with padding = 1)	98.83	82.50	99.29	80.72