Decoding of graphically encoded numerical digits using deep learning and edge detection techniques

Abstract

The encoding of a message is the creation of the message. The decoding of a message is how people can comprehend, and decipher the message. It is a procedure of understanding and interpretation of coded data into a comprehensible form. In this paper, a self-created explicitly defined function for encoding numerical digits into graphical representation is proposed. The proposed system integrates deep learning methods to get the probabilities of digit occurrence and Edge detection techniques for decoding the graphically encoded numerical digits to numerical digits as text. The proposed system’s major objective is to take in an Image with digits encoded in graphical format and give the decoded stream of digits corresponding to the graph. This system also employs relevant pre-processing techniques to convert RGB to text and image to Canny image. Techniques such as Multi-Label Classification of images and Segmentation are used for getting the probability of occurrence. The dataset is created, on our own, that consists of 1000 images. The dataset has the training data and testing data in the proportion of 9 : 1. The proposed system was trained on 900 images and the testing was performed on 100 images which were ordered in 10 classes. The model has created a precision of 89% for probability prediction.

Keywords

Image processing deep learning convolutional neural network multi-label classification image segmentation edge detection contours graphical encoding

1 Introduction

Usually, in securing any message or any code, the message which is in text and numerical formats is encrypted to another message containing text and numerical where intruders are not able to read the original message. This encryption can be done using existing algorithms. This is the securing process used in the present world. This paper comes up with the idea of securing the message or code using a graphical representation. Here the message to be encrypted is converted to a particular graph. This proposed system used Bar Graphs for encrypting the numerical code, considering the numerical codes with a maximum length of 10 digits. Firstly, bar graphs for the numerical digits (0-9) were created. Here the numbers are classified using the height of the bar in the graph. The height range of each digit is classified using an explicitly defined function which is used for encoding the digits. With the help of a defined encoded function, each number can be classified into a specified range. By using this height range, the dataset has been created for each numerical digit which contains images with graphical representation. For decoding the graphically represented images, the proposed model uses image segmentation, rectangular edge detection by using actual contours and height detection using contours. Deep learning and neural networks techniques such as convolutional neural networks (CNN) and multi-label classification were used for predicting the probabilities of occurrence of a digit in a larger number. This form of encoding can be used in real life. Nowadays phones are being tapped and the calls are being easily recorded and misused because of improper encoding. So, if we encode numerical digits in graphical form the risks of misuse of data can be reduced to a high extent. This can also be used to secure ATM pins and also any other numerical sequences which require security while being exchanged or during communication.

The association of the rest of the segments is as per the following –Section 2 begins with studies on existing exploration whose inferences are lined up with our work, Section 3 clarifies the dataset highlights and pre-preparing procedures used, Section 4 describes the setup used for the comparative study and evaluation for different proposed methods. The outcomes obtained from the profound learning model have been outfitted in Section 5. The conclusion comments are presented in the last section.

2 Related works

Our proposed system mainly uses graphical representation for encoding and decoding. Since there are no research works reported in the literature on graphically encoded systems, this section summarizes few works related to encoding and decoding over text and numerical digits that have been done over the past.

Chao Yuan et al. [1] proposed a method using neural networks to the encoding and decoding of polar codes. Polar codes in 5 G correspondence which meets the basic and ultra-low dormancy can likewise be improved. After being prepared by the neural system, they get a straightforward math recipe to assess their liability of the encoding in the polar codes, which can accomplish a similar impact as Gaussian Algorithm.

Ahmed Ibrahim [2] proposed a strategy to contemplate the impact of applying a crossover encoding/interpreting calculation to literary information. The proposed blend is that of Huffman and Run-Length calculations. Results show that the information position and the succession in which the calculations are applied influence the yield.

Linus Lagerhjelm [3] proposed a system to investigate different ways to deal with utilizing profound neural systems to perform cryptanalysis, with a definitive objective of having a profound neural system to translate scrambled information. This proposed system utilizes long momentary memory systems to attempt to interpret scrambled content and utilize a convolutional neural system to perform characterization undertakings on encoded MNIST pictures.

F. M. Barbosa et al. [4] proposed machine learning for cryptographic algorithm identification, aimed to examine scrambled content files to distinguish their encoding calculation. Plain messages are encoded with particular cryptographic calculations and afterwards, some metadata were extricated from these codifications.

The DES algorithm was created by the previous NIST organization and was broadly received by the industry. Kahate [5] states that this algorithm was the most utilized for two decades, even though its prevalence diminished because of its vulnerabilities. Tanenbaum [6] says that the first algorithm is not so secure, however, a few updates can modify it to be valuable. Pfleeger et al. [7] call attention to that its security may be accomplished by applying progressive methods of replacement and transposition. The Blowfish algorithm was proposed as an option for DES since this was helpless against savage power assaults and to others cryptanalysis approaches [8].

Crafted by Rezaul et al. [9] shows an effective unravelling strategy for Huffman codes and presents a novel information structure for Huffman coding in which notwithstanding sending images in the request of their appearance in the Huffman tree.

Min Wu et al. [10] proposed a model for customary video coding, the intricacy of an encoder is commonly a lot higher than that of a decoder since tasks, for example, movement estimation expend noteworthy computational assets. Such a codec design is appropriate for the downlink transmission model of communication.

There are various other encoding and decoding strategies proposed in the literature for different engineering applications. To cite but few references are [11, 12] and [13].

The proposed architecture involves one of basic supervised learning using more than 1000 images. The objective is to classify the images using deep learning techniques and predict probabilities of digits occurring in the number using multi-label classification and also later apply relevant techniques to obtain the decoded digits from the graphical input. The inputs were not of fixed size and were resized to 100 by 100. This architecture used two convolutional layers with two max-pooling layers and an optimization function used (ADAM optimizer). The experiment resulted in an accuracy of 89% for probability prediction of occurrence of digits. Further edge detection techniques were employed to finally get the desired output of numerical labels of each bar in the bar graph.

3 Exploratory analysis

This section describes the Encoding part i.e the conventional encoding methodology and also the framed encrypted function for this particular graphical encoding. The probability of prediction of a digit in a number, the dataset, hidden layers used and the layers of convolutional neural network architecture are also detailed. The methodology and the proposed technique involved includes the data gathering, minor pre-processing, hidden layers used, constructing the configuration of Convolutional Neural Network and constructing the model.

3.1 Defined encoded functions

3.1.1 Generic graphical encoder

Firstly, for encoding the code that consists of numerical digits of length 10, a specified range for each number was used. The number can be identified as the floor of the range values. The values of height ranges and corresponding numerical digits are shown in Table 1.

Table 1
Values and height range for Generic Graphical Encoder

Height Range 0-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10

Numerical Digit 0 1 2 3 4 5 6 7 8 9

Height Range	0-1	1-2	2-3	3-4	4-5	5-6	6-7	7-8	8-9	9-10
Numerical Digit	0	1	2	3	4	5	6	7	8	9

In the above-proposed system, the encoding function is so simple in such a way that anyone can easily decode the numerical digit by seeing the height of the bar represented in the graph. The output for the graph given in Fig. 1 is 9948155256. In Fig. 1, the decoding of the 10 digit number can be identified with the height of the bar corresponding floor values on Y-axis which results in poor security for the code encoded.

Fig. 1

Input image for Generic Graphical Encoder.

Since the above model is not secured for encoding the 10-digit code, another function which is highly secured for encoding the numerical digits in the graphical representation is proposed.

3.1.2 Explicitly defined encoded function

This is the defined function in which the 10 numerical digits is encoded and is highly secured. Here the function is unpredictable since it uses three sub-functions consisting of a parabola and straight-line function with specific domains. The definition of the function by using a graph with proper domains is shown in Fig. 2.

Fig. 2

Explicitly Defined Encoded Function Graph.

From Fig. 2, an equation for each line in the graph to form a single function for encoding the 10-digit code was developed. This equation makes the encryption code highly secured. The proposed Encryption Function is shown in Equation (1). $\begin{matrix} f (x) = {(floor (x) - 3)^{2} / 4, 0 ⩽ x < 3 \\ floor ((\sqrt{3 (floor (x) - 3)} + 3), 3 ⩽ x < 6 \\ - floor (x) + 15, 6 ⩽ x < 10} \end{matrix}$ (1)

Equation (1) defines the height ranges of each numerical digit. The height ranges and corresponding numerical digits are shown in Table 2.

Table 2

Values and height range for Explicitly Defined Encoded Function

Height Range	0-1	1-2	2-3	3-4	4-5	5-6	6-7	7-8	8-9	9-10
Numerical Digit	2	1	0	3	4	5	9	8	7	6

As discussed the output given for Fig. 1 is 9948155256. The graphical representation of the same output using the explicitly defined encoded function is shown in Fig. 3.

Fig. 3

Input image using Explicitly Defined Encoded Function.

3.2 Input and training data

The images that are to be trained are of about 900 images that are selected randomly from graphically encoded numerical digits dataset created for this experiment. The information that the images in the dataset contain is the graphical representation of encoded numerical digits. The graph that is used for representing the images of the dataset is a Bar Graph. Using the characteristics of the bar the height, width and position of the bar in the graph was found. The training images contain all possible positions and heights of the bar in the graph for all the numerical digits, to ensure accurate prediction of numerical digits.

3.3 Data re-modelling

Conversion to grayscale –The input image was converted to grayscale. The greyscale converted image of the image in Fig. 3 is shown in Fig. 4.

Fig. 4

The input image converted to greyscale.

Removing axis to enhance the encoding – The axis of the bar graph was removed so that estimation using the naked eye is tougher as the Y-axis estimates cannot be seen.

Normalizing and Reshaping – A base pre-processing was applied to the dataset to decrease calculation entanglements and to accomplish better proficiency. The pictures from the dataset were converted to grayscale pictures. Since grayscale pictures contain just one shading, it will be simple for CNN to learn and process the images. Then the images are normalized by dividing the corresponding NumPy array by 255 so that now the corresponding NumPy values of images lie between 0 to 1 to have a better computation for the prediction. Then, the images were resized to the required size, since the required size is 100*100 pixels the images are resized accordingly and fed to CNN for training. The input image after application of the above preprocessing steps is depicted in Fig. 5.

Fig. 5

Input image after resizing to 100x100 and axis removal.

3.4 Dataset

The proposed model contains the dataset which was classified into 10 classes (0, 1, 2, ... .9). Here the dataset contains 1000 images in which the training data contains 900 images and testing data contains 100 images. The dataset contains the images of graphical representation of each numerical digit. For each digit, the graph contains a bar with the range height specified for the digit. The height range of each digit is divided into 10 parts so that 10 bar graphs are created with 10 heights within the range. And the position of the bar at 10 positions on the X-axis since the length of the code specified is 10 digits. So we get a total of 100 images for each digit.

4 Proposed architecture

4.1 Proposed architecture for predicting the probability of a digit to be in the overall number

The proposed model comprises two convolutional layers, with two max-pooling layers followed by a dropout and an optimization function. The optimization function used was ADAM optimizer which is shown in Fig. 2.

The first convolution layer has 32 different filters with the kernel size of 5 x 5. The activation function used was the Rectified Linear Unit (ReLU). Rectified Linear activation function gives better results than other activation functions for similar architectural models. As in for the first layer, the size of the input image was specified. The input image size was100 x 100 x 1 which demonstrated that a grey-scale image size of 100 x100 was provided to the network. This can be noted by seeing the third dimension which turns out to be one. The output of the current layer is extracted and passed on to the trailing layer which turns out to be the pooling layer.

The first CNN of the proposed model has a max-pooling layer of 5 x 5 size which extracts the maximum value from a window of 5 x 5 size. The spatial size of the portrayal was decreased bit by bit as the pooling layer takes just the max values which are produced and forget about the rest. This layer enables the system to comprehend the pictures better as it chooses the significant features out of it. After the max-pooling layer, it was passed to the dropout layer where 25 per cent of nodes which don’t contribute were removed. The successor layer was again a convolution layer and now the number of filters is 64 but the same kernel size of 5 x 5 remains the same along with the default stride. Again, ReLU activation function was used in this layer. This was followed by another max-pooling layer which is of size 2 x 2 and dropout layer with 25 per cent. The output layer has 9 nodes. This layer used Sigmoid as the activation function which gives the output which is the expected value for each class. The compilation of the proposed model was done with ADAM optimizer function with a default learning rate of 0.001. Finally, the metric of Accuracy was stated to make a note of the assessment procedure. The skeleton of the proposed model is provided in Table 3.

Table 3
Summary of the proposed model

Model Configuration Attributes

First Convolution Layer 32 filters, kernel size 5x5, ReLU, input size 100x100

First Pooling Layer Max pooling –Size 2x2

Activation Function ReLU

Dropout layer 25 per cent

Second Convolution Layer 64 filters with kernel size 5x5, ReLU

Second Pooling Layer Max pooling –Size 2x2

Activation Function ReLU

Dropout layer 25 per cent

Output Layer 9 nodes &Sigmoid activation function used.

Optimizer ADAM Optimizer

LR 0.001

Evaluation Metrics Accuracy

Model Configuration	Attributes
First Convolution Layer	32 filters, kernel size 5x5, ReLU, input size 100x100
First Pooling Layer	Max pooling –Size 2x2
Activation Function	ReLU
Dropout layer	25 per cent
Second Convolution Layer	64 filters with kernel size 5x5, ReLU
Second Pooling Layer	Max pooling –Size 2x2
Activation Function	ReLU
Dropout layer	25 per cent
Output Layer	9 nodes &Sigmoid activation function used.
Optimizer	ADAM Optimizer
LR	0.001
Evaluation Metrics	Accuracy

4.2 Proposed architecture for the rectangular shape detection in the graph, separation between bars and Contour definition

Firstly, the test image was taken. The image was then completely encoded as we have removed the axis to discard the easiness level to directly estimate the value of the digit from the height of the bar. The encoding function added in place made the encoding even stronger. Then, the test image was converted from one colour space to another using open space library cvtColor. Here, the image was converted from RGB space to grey space. This was done because the system was dealing with boundary marking and edge detection of the bar which is irrespective of the colour. Now, the grayscale image goes through a Gaussian blur. Gaussian Blur is a function to apply Gaussian Smoothing on the input source image. Gaussian blur takes in the source (greyscale) image and the Ksize (kernel size) which is 3 x 3 here. The input image after Gaussian blur is shown in Fig. 6.

Fig. 6

The input image after Gaussian blur.

Then, the Gaussian blur image was sent as an input for the OpenCV canny edge detection function. Canny edge detection was used to detect the edges in an image. It accepts a grayscale image as input and it uses a multi-stage algorithm. This can be performed on an image using the Canny() method of the imgproc class. The image after application of canny edge detection is shown in Fig. 7.

Fig. 7

Image after Canny Edge Detection.

This canny image now has the edges detected and with no colour just as how the process wanted it. Finding all contours using CV2’s function was carried out. This gave output images, contours and hierarchy.

Contour is one of the types of segmentation principles. It is a supervised segmentation process. Active Contour Segmentation is also called snakes and is initialized using user-defined contour or a line, around the area of interest, and this contour then slowly contracts and is attached or repelled from light and edges. In the arguments, we give in the grayscale blurred image and we also give a contour retrieval mode which is cv2.RETR_EXTERNAL. It retrieves only the extreme outer contours instead of all the contours. The next argument is the contour approximation method.cv2. CHAIN_APPROX_SIMPLE. It removes all redundant points and compresses the contour, thereby saving memory.

For each contour obtained, the proposed system iterates over them and uses the cv2. boundingRect(). It was used to draw an approximate rectangle around the binary image. This function was used mainly to highlight the region of interest after obtaining contours from an image which is the rectangle around each bar in the graph. Also, obtained an estimated height of the contour in an irregular scale. The system brought an irregular scale to the standard scale and then displayed the respective height of each contour on its top and drew a green boundary over each rectangle detected in the image.

This eventually led to all bars in the graph being detected and having the corresponding heights on its top. And these heights according to the proposed encoding function were decoded later on and the resultant number was written as text on each bar in the graph.

5 Results and discussion

5.1 Model result to predict probabilities

As performance metrics measured for the proposed system is presented in Table 4. It shows the system could achieve the training accuracy of 89%.

Table 4
Statistics of the proposed model

Metrics Value

Training Accuracy 89%

Training Loss 0.35

Validation Accuracy 89%

Validation loss 0.40

Metrics	Value
Training Accuracy	89%
Training Loss	0.35
Validation Accuracy	89%
Validation loss	0.40

5.2 Getting the decoded number from the input graph

The image shown in Fig. 8 is output obtained for the original input image, in a separate window which matches the encoding of the number. The Output image for the input image shown in Fig. 5 is shown in Fig. 8. The proposed system also works for the numeric digits with varying lengths. For example, for input 72313, the input graph is depicted in Fig. 9. The decoded output for the above input is depicted in Fig. 10. As another example, for the input 81893419, the input graph and the corresponding decoded graphs are depicted in Figs. 11 and 12, respectively.

Fig. 8

The decoded/predicted numerically for the given input.

Fig. 9

The encoded input graph for the number ‘72313‘.

Fig. 10

The decoded/predicted numerically for the given input graph Fig. 9.

Fig. 11

The encoded input graph for the number ‘81893419‘.

Fig. 12

The decoded/predicted numerically for the given input graph shown in Fig. 11.

The green line indicates the contour and the text on top of each bar depicts the corresponding number.

6 Conclusion

A system with deep learning principles is proposed for secured message communication. The proposed system used a self-created explicitly defined function for encoding numerical digits into a graphical representation. The input given to the system is a graph which is generated by the encoding function. This graph is pre-processed by image pre-processing techniques. This image or the graph is sent to the model and decoded by decoding techniques the final output of the decoded image or the graph is the sequence of numbers which is a 10-digit code.

This work can be further expanded on to different graphical forms of encoding and multiple other encoding functions. The accuracy of the probability prediction also can be further improved. This can also be further extended to variable-length images.

References

Yuan

, Wu

, Cheng

and Yang

, Deep Learning in Encoding and Decoding of Polar Codes, Journal of Physics: Conference Series, 2nd International Conference on Data Mining, Vol 1060, 2018.

Ibrahim

, Plain Text Encoding/Decoding Technique Using a Combination of Huffman and Run-Length Algorithms, Current Journal of Applied Science & Technology16(2) (2016), 1–10.

Lagerhjelm

, Extracting Information from Encrypted Data using Deep Neural Networks, Tech Report, Corpus ID: 204923723, 2019.

Barbosa

F.M.

, Vidal

and de Melloe

F.L.

, Machine Learning for Cryptographic Algorithm Identification, Journal of Information Security and Cryptography3(1) (2016).

Kahate

, Cryptography and Network Security, 3^rd Edition, Tata McGraw Hill Education, 2013.

Tanenbaum

, Computer Network, 5^th Edition, Boston, Pearson, 2011.

Pfleeger

C.P.

, Pfleeger

S.L.

, Margulies

, Security in Computing, 5^th Edition, Boston, Prentice Hall, 2006.

Schneier

, Fast Software Encryption, 7^th International Workshop, FSE New York, NY, USA, 2000.

Chowdhury

R.A.

, Kaykobad

and King

, An efficient decoding technique for Huffman codes, Elsevier Science81 (2012), 305–308.

10.

, Vetro

, Yedidia

J.S.

, Sun

and Chen

C.W.

, A study of encoding and decoding techniques for syndrome-based video coding, in IEEE Symposium on Circuits and Systems, pp. 3527–3530, 2005.

11.

Vasudevan

S.K.

, Janakiraman

and Vasudevan

, SDR and Error Correction using Convolution Encoding with Viterbi Decoding, Journal of Emerging Technologies in Web Intelligence2(2) (2010), 122–130.

12.

Gandhiraj

, Radhakrishnan

, S. S., Kurian

and Alluri

, Huffman Encoding and Decoding Using Android, In Proceedings of 5th IEEE International Conference on Communication and Signal Processing-ICCSP 16, 2016.

13.

Roshini

, Bavya

and Jeyakumar

, RoBaJe–A Simulated Computational Model for Human Memory to Illustrate Encoding and Decoding of Information, International Journal of Applied Engineering Research9 (2014), 26957–26970.