Multi-task learning network for handwritten numeral recognition

Abstract

Handwritten numeral recognition is a challenging problem in the character recognition field due to the large variation in the writing styles of different persons and high similarity in the contour of different numerals. To address this problem, an effective multi-task learning network (MTLN) for handwritten numeral recognition is presented in this paper. Based on the observation that the writing style could play an effective complementary role to the learned feature extracted from numerals, the proposed MTLN simultaneously performs the handwritten numeral learning module and the writing style learning module. Consequently, the determination of scratchy/non-scratchy in the writing style learning module can effectively assist the handwritten numeral learning module to obtain a more robust and distinguishable feature so as to improve the recognition performance. Extensive experiments on multiple existing handwritten numeral datasets have demonstrated that the proposed MTLN can effectively improve the recognition accuracy, and outperform multiple state-of-the-art methods.

Keywords

Handwritten numeral recognition multi-task learning network convolutional neural network

1. Introduction

Character recognition is a vital branch of pattern recognition and has been widely applied in various fields, such as information processing, traffic system, machine translation, and so on. Due to different characters, various intelligent recognition systems have been developed for identifying different characters, including vehicle license plate recognition [1, 2], English character recognition [3, 4], numeral character recognition [5, 6], Chinese character recognition [7, 8], to name a few.

With regard to numeral character recognition, handwritten numeral recognition has a great value from the viewpoint of practical application, e.g., large-scale data statistics, tax administration, financial affairs, and letter sorting, etc. For that, a straightforward solution is the traditional character classification framework called optical character recognition (OCR), which mainly consists four stages: image acquisition, image preprocessing, feature extraction and classification. Although the OCR has achieved good performance in the printed character recognition, it fails to yield a high accuracy on the handwritten numeral recognition.

This is because compared with the printed numeals, handwritten numerals usually have irregular shapes and arbitrary writings, since different persons have different handwritten styles. Many factors, e.g., the thickness of the strokes, the size of the font, and the tilt of font, will directly influence the handwritten numeral recognition. Refer to Fig. 1 for some handwritten numeral examples. In addition, the application field of the handwritten numeral recognition usually requires a very high accuracy, which supposes that there should be no error in the numeral recognition result. Therefore, how to develop an effective handwritten numeral recognition method becomes a challenging yet significant task in the computer vision field.

To study the handwritten numeral recognition, LeCun et al. [9] developed a MNIST database, which is served as the benchmark to evaluate the performances of various handwritten numeral recognition methods. Based on MNIST database and limited computational resource, LeNet-5 convolution network is also presented to obtain relatively arcuate numeral classifiers [9]. Simard et al. [10] used a plain multiple layer perceptron (MLP) algorithm with only one hidden layer for the handwritten numeral recognition. Kasun et al. [11] applied the extreme learning machine (ELM) on handwritten numeral recognition.

Fig. 1.

Some samples of handwritten numeral from various datasets.

In order to further improve the accuracy, some hybrid handwritten numeral recognition algorithms are presented by combining different strategies. For example, Luo et al. [12] proposed a new method by combining the ELM and sparse representation based classification (SRC), named as ELM-SRC, which outperforms the ELM. Niu et al. [13] jointly used Convolutional neural network (CNN) and support vector machine (SVM), called CNN-SVM. Guo et al. [14] proposed a hybrid learning model with CNN and ELM.

With the rapid increment of data and development of Graphics Processor Unit (GPU) and cloud computing in recent years, the deeper and wider CNNs are widely applied in various areas of image classification [15], e.g., face recognition [16], pedestrian re-identification [17 –20]. Regarding handwritten numeral recognition, Ciresan et al. [21] exploited a committee of 35 convolutional deep neural networks (DNNs) similar to the architecture with 6 layers [22] to further decrease the error rate. Tabik et al. [23] provided some data-preprocessing techniques, such as centering, translation, rotation etc., to increase the training data so as to effectively improve the recognition accuracy.

In this paper, an effective multi-task learning network for handwritten numeral recognition is presented. Based on the observation that the scratchy/non-scratchy in the writing style could play an effective complementary role to the learned feature extracted from numerals themselves, the proposed method designs a multi-task learning network (MTLN) to simultaneously perform the handwritten numeral learning module and the writing style learning module. Such network is able to effectively share the relevant and irrelevant information among these two tasks for enlarging inter-class difference and shortening inner-class similarity. Consequently, extensive experiments have shown that the proposed MTLN method can effectively recognize the handwritten numerals.

The rest of this paper is organized as follows. Section 2 introduces the proposed handwritten numeral recognition method in detail. Experimental results and comparisons are given in Section 3. Finally, Section 4 outlines the conclusions.

2. Proposed multi-task learning network for handwritten numeral recognition

2.1. Network architecture

Figure 2 shows the architecture of the proposed MTLN approach for handwritten numeral recognition. As shown in Fig. 2, the input images are with a fixed size of 28 × 28, and the MTLN contains eight layers, including 4 convolutional layers (i.e., C1, C2, C4, and C5), 2 maximum-pooling layers (i.e., M3 and M6), and 2 fully-connected layers (i.e., F7 and F8).

Fig. 2.

The architecture of the proposed MTLN for handwritten numeral recognition.

In the convolutional layers, the convolution operation is calculated as $y^{(l)} = w^{(l)} \otimes x^{(l - 1)} + b^{(l)}$ (1)

where l is the l-th layer, x^(l-1) and y^(l) denote the input of l-th layer and the output feature map of l-th layer, respectively. w^(l) means the weight vector of convolution kernel, which is the same for all neurons (i.e., weight sharing). ⊗ and b denote the convolution operation and corresponding trainable bias, respectively.

After performing the convolution operation, a nonlinear activation operation is conducted to obtain the nonlinear mapping, that is, the rectified linear unit (ReLU) [24] function, as follows. $ReLU (y_{i}^{(l)}) = \max (0, y_{i}^{(l)})$ (2)

Since the ReLU activation function can speed up the network training process [24], the ReLU is employed in all convolutional layers (i.e., C1, C2, C4, C5) and the first fully-connected layer F7 in our proposed MTLN. Moreover, to avoid the gradient vanishing problem, Batch Normalization (BN) [25] is further utilized in the first two convolutional layers C1 and C2. The corresponding algorithm steps of BN can be referred to Table 1.

Table 1

Algorithm steps of BN operation

Input: Value of y^l over a mini-batch: $κ = {y_{1}^{l}, \dots, y_{m}^{l}}$ ; m denotes the batch size. Output: $b_{i} \leftarrow γ y_{i}^{l} + β \equiv {BN}_{γ, β} (y_{i}^{l})$ ; Parameters γ and β are to be learned.
$μ_{κ} \leftarrow \frac{1}{m} \sum_{i = 1}^{m} y_{i}^{l}$	//mini-batch mean
$σ_{κ} \leftarrow \frac{1}{m} \sum_{i = 1}^{m} {(y_{i}^{l} - μ_{κ})}^{2}$	//mini-batch variance
${\hat{y}}_{i}^{l} \leftarrow \frac{y_{i}^{l} - μ_{κ}}{\sqrt{σ_{κ}^{2} + ɛ}}$	//normalization
$b_{i} \leftarrow γ y_{i}^{l} + β \equiv {BN}_{γ, β} (y_{i}^{l})$	//scale and shift
Parameter ɛ is a constant that guarantees numerical stability.

In order to reduce the dimension of feature maps obtained from convolution layers and the amount of model parameters, the sub-sampling operations are utilized after convolution layers, which can be presented as below: $y^{(l)} = f (w^{(l)} {down (y}^{(l - 1)}) + b^{(l)})$ (3)

where down (•) denotes a down sampling function. In the proposed MTLN method, the sub-sampling function is max pooling. Specifically, two maximum pooling layers (i.e., M3 and M6) are employed after convolution layers (i.e., C2 and C5), respectively.

In the proposed MTLN, the number of filters in first two convolutional layers (i.e., C1 and C2) is set to 32, and the number of filters in last two convolutional layers (i.e., C4 and C5) is set to 64. In addition, the tiny-filter with size of 3 × 3 is employed in all convolutional layers, since a stack of two 3 × 3 convolutional layers with no spatial pooling in between has more effective receptive filed than 5 × 5 [26]. For two maximum-pooling layers, the 2 × 2 max pooling operation is applied. More parameter configuration of the proposed MTLN is listed in Table 2.

Table 2

The parameter details of the proposed MTLN method

Layer	Type	Filter size	Filter number	Stride	Output size
C1	Input	–	–	–	3 × 28 × 28
	Convolution	3 × 3	32	1	32 × 26 × 26
	BN	–	–	–	32 × 26 × 26
	ReLU	–	–	–	32 × 26 × 26
C2	Convolution	3 × 3	32	1	32 × 24 × 24
	BN	–	–	–	32 × 24 × 24
	ReLU	–	–	–	32 × 24 × 24
M3	Max-pooling	2 × 2	32	2	32 × 12 × 12
C4	Convolution	3 × 3	64	1	64 × 10 × 10
	ReLU	–	–	–	64 × 10 × 10
C5	Convolution	3 × 3	64	1	64 × 8 ×8
	ReLU	–	–	–	64 × 8 ×8
M6	Max-pooling	2 × 2	64	2	64 × 4 ×4
F7	FC	1 × 1	1024\|1024	–	(1024\|1024) ×1 × 1
	Dropout	–	–	–	(1024\|1024) ×1 × 1
	ReLU	–	–	–	(1024\|1024) ×1 × 1
F8	FC	1 × 1	10\|2	–	(10\|2) ×1 × 1

2.2. Multi-task learning

Multi-task learning is considered as an effective solution to simultaneously solve several different problems by learning different related tasks in parallel while sharing the information obtained in different tasks [27]. It can improve the generalization ability by exploiting the information contained in the training signals of related tasks as an inductive bias [28]. Consequently, multi-task learning method can effectively improve the performance relative to learning each task independently.

Based on this motivation, the proposed method designs a special learning network to simultaneously perform two learning tasks: handwritten numeral learning and writing style learning. Such network can enlarge the inter-class difference and shorten the inner-class similarity via sharing the relevant and irrelevant information among these two tasks so as to improve the recognition accuracy.

Handwritten numeral learning: This task is to learn how to distinguish ten numeral categories (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), as shown in the upper part of Fig. 2. It contains two additional fully-connected layers: 1024 neuron units in F7 and 10 neuron units in F8. The latter one (10 neuron units) correspond to 10 classes (i.e., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9), respectively. In the first fully-connected layer, the output of each neuron is set to zero with the probability of 0.5 for improving the generalization ability and avoiding over-fitting problem as suggested in Dropout [24]. Then, a 10-way Softmax loss function is employed, and the cross entropy loss of handwritten numeral feature learning, is minimized by: $L_{1} = - \sum_{i = 1}^{10} t_{i} log y_{i}$ (4)

where (t₁, t₂, t₃, …, t₁₀) = (1, 0, 0, …, 0) denotes the numeral 0, (t₁, t₂, t₃, …, t₁₀) = (0, 1, 0, …, 0) denotes the numeral 1, (t₁, t₂, t₃, …, t₁₀) = (0, 0, 1, …, 0) denotes the numeral 2, and so on, while (y₁, y₂, y₃, …, y₁₀) is the posterior probability output of Softmax layer.

Writing style learning: This task is to learn whether the writing style is scratchy or non-scratchy, as shown in the lower part of Fig. 2. It also contains two fully-connected layers: 1024 neuron units in F7 and 2 neuron units in F8. The latter one (2 neuron units) correspond to 2 classes (i.e., scratchy or non-scratchy), respectively. Similarly, in the first fully-connected layer, the output of each neuron is set to zero with the probability of 0.5 for improving the generalization ability and avoiding over-fitting problem as suggested in Dropout [24]. Then, a 2-way Softmax loss function is used, and the cross entropy loss of writing style learning, L₂, is minimized by:

$L_{2} = - \sum_{i = 1}^{2} t_{i} log y_{i}$ (5)

Table 3

Training and testing images from the MNIST database

Numeral	Training (scratchy + non-scratchy)	Testing (scratchy + non-scratchy)
0	5923 = (808+5115)	980 = (100+880)
1	6742 = (1034+5708)	1135 = (180+955)
2	5958 = (2511+3447)	1032 = (369+663)
3	6131 = (1112+5019)	1010 = (198+812)
4	5842 = (1552+4290)	982 = (310+672)
5	5421 = (1254+4167)	892 = (107+785)
6	5918 = (1390+4528)	958 = (214+744)
7	6265 = (1000+5265)	1028 = (183+845)
8	5851 = (1223+4628)	974 = (183+791)
9	5949 = (1056+4893)	1009 = (172+837)
Total	60000	10000

where (t₁, t₂) = (1, 0) means the scratchy style, and (t₁, t₂) = (0, 1) represents non-scratchy style.

The proposed MTLN approach jointly performs the above-mentioned two tasks. Consequently, the loss of the proposed MTLN can be minimized in terms of $L = \sum_{i = 1}^{2} λ_{i} L_{i}$ (6)

where λ_i is the weight of i-th task, while L₁ and L₂ can be computed based on Equations (4 or 5), respectively. By simply treating these two tasks equally important, λ₁ = λ₂ = 0.5 is set in our work.

2.3. Implementation details

Some implementation details of the proposed MTLN method can be described as follows. Firstly, the experiments are implemented on a workstation with Xeon E5-2650 2.60GHz processor and Nvidia GeForce GTX TITAN X based on the deep learning framework called Caffe [29]. Secondly, the proposed MTLN method is trained by using stochastic gradient descent (SGD) [30] with a batch size of 60 samples, weight decay of 0.0005, momentum of 0.9. The learning rate is initialized as lr_base = 0.01 and updated by: ${lr}_{iter} = {lr}_{base} \times (1 + gamma \times iter)^{(- power)}$ (7)

where iter and lr_iter denote iteration numbers and the learning rate of the current iteration numbers, respectively. And gamma and power are set as 0.0001 and 0.75, respectively. During the training process, the weights are initialized by a normal distribution N (0, 0.01) and the biases are initialized as 0. In addition, all the image samples used for training are subtracted their corresponding mean values, and all the training data are randomly shuffled before sending to the proposed MTLN network.

3. Experimental results and discussions

3.1. Datasets and evaluation protocol

In this section, the performance of the proposed MTLN method and multiple state-of-the-art methods are investigated on the benchmark dataset, i.e., MNIST dataset [9]. This dataset consists of 60,000 training images and 10,000 testing images, including ten classes corresponding to “0” to “9”, and all the images are with a fixed size of 28 × 28. Some handwritten numeral image samples from MNIST dataset can be found in Fig. 1. It can be clearly observed that there are different kinds of handwritten numerals, largely varying in the shape due to different writing style. Some are neat and easy to be recognized, while some are scratchy and hard to be distinguished. Table 3 shows the corresponding training and testing images for handwritten numeral learning. For the writing style learning, we further divide the corresponding training images into two parts: scratchy images and non-scratchy images. Moreover, the test error rate is used as the performance index to evaluate the performance of various handwritten numeral recognition methods, which is calculated as the ratio of the false recognition images to the total test images.

3.2. Performance comparison

Table 4 shows the performance of the proposed MTLN method and multiple state-of-the-art methods, including Hinton et al. [31], Salakhutdinov et al. [32], LeNet-A [21], LeNet-B [21], LeNet-C [21], Network3 [33], Guo et al. [14], Labusch et al. [34], Lauer et al. [35]. To analyze how much of the contributions coming from the writing style learning, the performance resulted from the handwritten numeral learning (HNL) solely is also investigated and shown in Table 4. Note that the HNL network is exactly the upper part in Fig. 2.

Table 4
Performance comparison of different methods

Methods Test error rate (%)

Hinton et al [31] 1.13

Salakhutdinov et al. [32] 0.95

LeNet-A [21] 0.95

LeNet-B [21] 0.90

LeNet-C [21] 0.81

Network3 [33] 0.75

Guo et al. [14] 0.67

Labusch et al. [34] 0.59

Lauer et al. [35] 0.54

Proposed HNL 0.53

Proposed MTLN 0.40

Methods	Test error rate (%)
Hinton et al [31]	1.13
Salakhutdinov et al. [32]	0.95
LeNet-A [21]	0.95
LeNet-B [21]	0.90
LeNet-C [21]	0.81
Network3 [33]	0.75
Guo et al. [14]	0.67
Labusch et al. [34]	0.59
Lauer et al. [35]	0.54
Proposed HNL	0.53
Proposed MTLN	0.40

One can see from Table 4 that the proposed MTLN method is able to achieve the lowest test error rate (i.e., 0.40%), and consistently outperforms multiple state-of-the-art methods under comparison. In addition, it can be further observed the proposed HNL can achieve relatively good performance, and the proposed MTLN that jointly explores HNL and writing style learning achieves better performance than the proposed HNL. This study indicates that the writing style learning plays an effective complementary behavior to the HNL so as to further decrease the test error rate.

3.3. Cross-dataset evaluation

To demonstrate the generalization ability, the proposed MTLN method and some state-of-the-art methods are tested on two completely unseen datasets, USPS and Binary Alpha-digits [36]. This experiment takes the following methods as examples, including LeNet-A [21], LeNet-B [21], LeNet-C [21], and Networks3 [33]. In this cross-dataset evaluation, the corresponding numbers of testing handwritten numeral images are shown in Table 5. Note that 1) the Binary Alpha-digits dataset consists of not only 10 handwritten numerals but also 26 handwritten capitals. In our experiments, only the handwritten numerals are selected as testing images. 2) The sizes of the handwritten numeral images in datasets USPS and Binary Alpha-digits are 16 × 16 and 20 × 16, respectively. In our experiments, all these testing images are resized to 28 × 28 to adapt to the proposed network. Some image examples of handwritten numerals from these two datasets can be referred to Fig. 1. Similarly, the commonly-used performance index, test error rate, is also exploited to measure the performance.

Table 5
Testing images for cross-dataset evaluation

Datasets Testing size

USPS [36] 11000 (Each numeral has 1100 examples)

Binary Alpha-digits [36] 390 (Each numeral has 39 examples)

Total 11390

Datasets	Testing size
USPS [36]	11000 (Each numeral has 1100 examples)
Binary Alpha-digits [36]	390 (Each numeral has 39 examples)
Total	11390

The results of cross-dataset evaluation in terms of test error rate are summarized in Table 6. It can be clearly observed that the test error rate of all the methods are increased in the cross-dataset evaluation, compared with that in Table 4. The logical behind this observation is that the testing images from USPS and Binary Alpha-digits datasets are completely unknown for these methods in this cross-dataset evaluation. Furthermore, one can see that the proposed MTLN also obtains the lowest test error rate, and consistently outperforms the state-of-the-art methods under comparison. This investigation indicates that the proposed MTLN has good generalization ability.

Table 6

Performances comparison in cross-dataset evaluation

Methods	Test error rate (%)
LeNet-A [21]	16.34
LeNet-B [21]	15.16
LeNet-C [21]	22.26
Network3 [33]	15.68
Proposed HNL	9.20
Proposed MTLN	6.19

4. Conclusion

In this paper, a novel multi-task learning network (MTLN) is proposed for handwritten numeral recognition. The success of the proposed MTLN is due to that the specially designed MTLN can simultaneously perform two tasks: handwritten numeral learning and writing style learning. Since the writing style learning has complementary behavior to the handwritten numeral learning, to a certain extent, the proposed MTLN can fully explore their correlation to obtain a more robust and distinguishable classifier with higher handwritten numeral recognition rate. Experiments on multiple handwritten numeral datasets show that the proposed MTLN can effectively improve the accuracy of handwritten numeral recognition, and have good generalization ability. Moreover, it also has been demonstrated that the proposed MTLN method is consistently superior to multiple state-of-the-art handwritten numeral recognition methods.

Footnotes

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under the Grants 61871434, 61602191 and 61802136, in part by the Natural Science Foundation of Fujian Province under the Grants 2016J01308 and 2017J05103, in part by the Promotion Program for Young and Middle-aged Teacher in Science and Technology Research of Huaqiao University under the Grants ZQN-YX403 and ZQN-PY418, in part by the High-Level Talent Project Foundation of Huaqiao University under the Grants 14BS201, 14BS204, and 16BS108, and in part by the Subsidized Project for Postgraduates Innovative Fund in Scientific Research of Huaqiao University.

References

Li and

Shen , Reading car license plates using deep convolutional neural networks and lstms, arXiv preprint arXiv: 1601.05610, 2016.

Menotti ,

Chiachia ,

A.X.

Falcao , and

V.J.O.

Neto , Vehicle license plate recognition with random convolutional networks, Graphics, Patterns and Images, 2014, pp. 298–303.

Pal and

Singh , Handwritten english character recognition using neural network, International Journal of Computer Science & Communication 3 (2010), 1328–1331.

Yuan ,

Bai ,

Jiao and

Liu , Offline handwritten english character recognition based on convolutional neural network, Iapr International Workshop on Document Analysis Systems (2012), 125–129.

Shokoohi ,

A.M.

Hormat ,

Mahmoudi and

Badal-babadi , Persian handwritten numeral recognition using complex neural network and non-linear feature extraction, Pattern Recognition and Image Analysis (2013), 1–5.

A.M.S.

Chowdhury and

M.S.

Rahman , Towards optimal convolutional neural network parameters for bengali handwritten numerals recognition, International Conference on Computer and Information Technology, 2017, pp. 431–436.

Wang ,

Li ,

Liu ,

Ding and

Chen , An mqdf-cnn hybrid model for offline handwritten chinese character recognition, International Conference on Frontiers in Handwriting Recognition, 2014, pp. 246–249.

He ,

Zhang ,

Mao and

Jin , Recognition confidence analysis of handwritten chinese character with cnn, International Conference on Document Analysis and Recognition, 2015, pp. 61–65.

LeCun ,

Bottou ,

Bengio and

Haffner , Gradient-based learning applied to document recognition, Proceedings of the IEEE 86(11) (1998), 2278–2324.

10.

P.Y.

Simard ,

Steinkraus and

J.C.

Platt , Best practices for convolutional neural networks applied to visual document analysis, International Conference on Document Analysis and Recognition, 2003, pp. 958–963.

11.

L.L.C.

Kasun ,

Zhou ,

Huang and

C.M.

Vong , Representational learning with extreme learning machine for big data, IEEE Intelligent System 28(6) (2013), 31–34.

12.

Luo and

Zhang , A hybrid approach combining extreme learning machine and sparse representation for image classification, Engineering Applications of Artificial Intelligence 27(C) (2014), 228–235.

13.

X.X.

Niu and

C.Y.

Suen , A novel hybrid cnn-svm classifier for recognizing handwritten digits, Pattern Recognition 45(4) (2012), 1318–1325.

14.

Guo and

Ding , A hybrid deep learning cnn-elm model and its application in handwritten numeral recognition, Journal of Computational Information Systems 11(7) (2015), 2673–2680.

15.

Lemley ,

Bazrafkan and

Corcoran , Deep learning for consumer devices and services: Pushing the limits for machine learning, artificial intelligence, and computer vision, IEEE Consumer Electronics Magazine 6(2) (2017), 48–56.

16.

Feng ,

Yuan ,

J.S.

Pan ,

J.F.

Yang ,

Y.T.

Chou ,

Zhou and

Li , Superimposed sparse parameter classifiers for face recognition, IEEE Transactions on Cybernetics 47(2) (2017), 378–390.

17.

Xiao ,

Li ,

Ouyang and

Wang , Learning deep feature representations with domain guided dropout for person re-identification, IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1249–1258.

18.

Zhu ,

Zeng ,

Liao ,

Lei ,

Cai and

L.X.

Zheng , Deep hybrid similarity learning for person re-identification, IEEE Transactions on Circuits and Systems for Video Technology, 2017, PP. 1–1.

19.

Cai ,

Zhu ,

Zeng ,

Chen ,

Cai and

K.K.

Ma , Hog-assisted deep feature learning for pedestrian gender recognition, Journal of the Franklin Institue (2017). doi: 10.1016/j.jfranklin.2017.09.003

20.

Cai ,

Zhu ,

Zeng ,

Chen and

Cai , Deep-learned and hand-crafted features fusion network for pedestrian gender recognition, Proceedings of ELM-2016 9 (2018), 207–215.

21.

Ciresan ,

Meier and

Schmidhuber , Multi-column deep neural networks for image classification, Computer Vision and Pattern Recognition 157(10) (2012), 1036–1041.

22.

V.V.

Romanuke , Training data expansion and boosting of convolutional neural networks forreducing the mnist dataset error rate, Scientific news of NTUU "Kyiv Polytechnic Institute", 2016, pp. 29–34.

23.

Tabik ,

Peralla ,

Herrera-Poyatos and

Herrera , A snapshot of image pre-processing for convolutional neural networks: Case study of mnist, International Journal of Computational Intelligence Systems 10(1) (2017), 555–568.

24.

Krizhevsky ,

Sutskever and

G.E.

Hinton , Imagenet classification with deep convolutional neural networks, Neural Information Processing Systems 25(2) (2012), 1097–1105.

25.

Ioffe and

Szegedy , Batch normalization: Accelerating deep network training by reducing internal covariate shift, arXiv: 1502.03167, 2015.

26.

Simonyan and

Zisserman , Very deep convolu-tional networks for large-scale image recognition, arXiv: 1409.1556, 2014.

27.

Scholkopf ,

Platt and

Hofmann , Multi-task feature learning, Conference on Advances in Neural Information Processing Systems, 2007, pp. 41–48.

28.

Caruana , Multitask learning, Machine Learning 28(1) (1997), 41–75.

29.

Jia ,

Shelhamer ,

Donahue ,

Karayev and

Long , Caffe: Convolutional architecture for fast feature embedding, ACM International Conference on Multimedia, 2014, pp. 675–678.

30.

Bottou , Stochastic Gradient Descent Tricks, Springer Berlin Heidelberg, 2012, pp. 421–436.

31.

G.E.

Hinton and

R.R.

Salakhutdinov , Reducing the dimensionality of data with neural networks, Science (New York, N.Y.) 313(5786) (2006), 504.

32.

Salakhutdinov and

Larochelle , Efficient learning of deep boltzmann machines, International Conference on Artificial Intelligence and Statistics, 2010, pp. 693–700.

33.

A.N.

Michael , Neural Networks and Deep Learning, 2015, http://neuralnetworksanddeeplearning.com

34.

Labusch ,

Barth and

Martinetz , Simple method for high-performance digit recognition based on sparse coding, IEEE Transactions on Neural Networks 19(11) (2008), 1985–1989.

35.

Lauer and

C.Y.

Suen , A trainable feature extractor for handwritten digit recognition, Pattern Recognition 40(6) (2007), 1816–1824.

36.

Wang ,

He ,

Bu ,

Chen ,

Chen and

Guan , Image representation using laplacian regularized nonnegative tensor factorization, Pattern Recognition 44(10-11) (2011), 2516–2526.

Multi-task learning network for handwritten numeral recognition

Abstract

Keywords

1. Introduction

2.1. Network architecture

3.1. Datasets and evaluation protocol

3.2. Performance comparison

Table 5 Testing images for cross-dataset evaluation Datasets Testing size USPS [36] 11000 (Each numeral has 1100 examples) Binary Alpha-digits [36] 390 (Each numeral has 39 examples) Total 11390

Footnotes

Acknowledgments

References

Table 5
Testing images for cross-dataset evaluation

Datasets Testing size

USPS [36] 11000 (Each numeral has 1100 examples)

Binary Alpha-digits [36] 390 (Each numeral has 39 examples)

Total 11390