Combining multi-features with a neural joint model for Android malware detection

Abstract

Combining natural language processing technology, image analysis technology and malware detection technology, a novel Android malware detection method, named BIHAD (an improved IndRNN and attention-treated DenseNet-based pipeline model), is proposed in this paper. First, in order to describe the behavior of Android malware, multiple features are used to construct a more stable discriminant method. Second, the embedding technology is introduced to map all behavior information into a vector space, which implements the extraction of the joint embedded information of semantics and images. Third, an improved Independently Recurrent Neural Network (IndRNN) is used to extract valuable texture information from the original values of the gray image, and effectively utilized the long distance information contained in the gray image. Finally, Hierarchical Attention Dense Convolutional Network (HADenseNet) is used to ensure the maximization of information flow between layers in the network, improving the utilization of semantic distribution and spatial context information. Especially, Hierarchical Attention can enhance the representational ability for key features. The comparison of the BIHAD model with several existing malware detection methods indicated a significant improvement in F-score achieved by the BIHAD.

Keywords

Dense convolutional network Independently Recurrent Neural Network hierarchical attention neural joint model android malware detection

1 Introduction

With billions of people using smartphones, smart devices are used to store sensitive personal information more frequently than laptops and desktops. Mobile device security is a security area of rising significance and cumulative need, but it is a comparatively weak area in protecting user’s data privacy [1, 2]. To put this into perspective, according to a recent report from Kaspersky Lab, up to $1 billion was stolen in roughly 2 years from financial institutions worldwide due to malware attacks. In addition, malware authors use some techniques, such as instruction virtualization, packing, polymorphism, emulation, and metamorphism to write and change malicious codes that can evade detection, which culminated in a massive proliferation of new malware samples due to their wide availability [3].

The traditional static signature matching technology [4] obtains the bytecode data stream information by the reverse technique and then matches with the data segment of the malicious signature sample to ascertain whether the detected file is a malicious file. To a certain extent, traditional static signature matching technology is incompetent to unknown malware, because the accuracy of malware detection depends on whether the feature library is perfect. Static malware analysis techniques offer a fast and useful mechanism for extracting meaningful information from a suspicious application. Different techniques, which fall within the scope of the so-called obfuscation techniques [5], are deliberately employed with the aim of shaping the code into a new different scheme. For example, a series of useless system calls can be introduced into a section of a piece of code, which is executed based on a condition, whose evaluation is always false when is executed. Therefore, static analysis is susceptible to obfuscation techniques and cannot effectively capture semantic information and sequential patterns in strings.

Word embedding technology is introduced to extract unknown behavior information for the malware to be detected, which provides more information for additional morphology parsing, reducing the impact of obfuscation techniques and the risk that virtual machines may be infected in dynamic analysis. Clearly, a better characterization of Android malware would achieve better accuracy for malware detection [6]. Word embedding is a technique in Natural Language Processing (NLP) that transforms the words in a vocabulary into dense vectors of real numbers in a continuous embedding space [7]. A typical word embedding method relies on the co-occurrences of a target word and its context, while traditional NLP systems represent words as indices in a vocabulary that does not capture the semantic relationships between words. Word embedding such as those learned by neural networks can explicitly encode distributional semantics in learned word vectors. We proposed a general framework to encode different types of behavioral information while using a neural joint model to detect Android malware. Compared with a set of neural network models, the proposed method has stronger generalization ability and can describe the behavior of Android malware more comprehensively.

2 Related work

The number and complexity reached by the newest malware forces make it an inevitable choice to explore new advanced techniques capable of analyzing and tackling the different problems. The huge amount of malware makes this impossible for human engineers to perform traditional software analysis for every malware sample. In this case, the analysis methods of malware can be divided into machine learning (ML) based methods and deep learning (DL) based methods. The ML-based detection methods can identify and detect malware using static or dynamic analysis to extract a set of features. Navarro et al. [8] proposed an ontology-based framework to shape the relationships between application and system elements. Considering the details of similarities among malicious programs, Chen et al. [9] discussed the construction of malware samples under three different threat models and analyzed how the ML-based classifiers are misdirected. Further, they proposed KuafuDet to address the adversarial environment, which can boost the detection accuracy by at least 15%.

Deep learning [10, 11], which simulates the mechanism of a human brain to interpret data, is a new area of the ML research, and it is increasingly applied to Android malware detection. Albasir et al. [12] proposed a novel framework based on deep learning. They utilized the deep learning ability to classify power consumption signals. Karbab et al. [13] and Yuan et al. [14] adopted the DL-based methods to detect malware automatically, both of which improved detection accuracy compared to the ML-based methods.

In this paper, we proposed an Android malware detection framework based on DL to improve the utilization of semantic distribution information and spatial context information. This framework is applied to learn a nonlinear embedding and to encode different types of semantic information (see the details in section 3). Specifically, the improved texture features are integrated into word and character embedding. This approach allows the proposed neural joint model to simultaneously capture semantic and image information and to deal with unseen words that might not be in the vocabulary, while also alleviating the degradation of detection performance due to the presence of too many new words. Experimental results show that the proposed BIHAD is consistently and significantly superior to other existing methods.

The main contributions of this paper are as follows:

To obtain the distribution semantics of words from a text corpus, we combined different types of behavior information of malware and implemented the extraction of joint semantic and image information. We also enhanced the quality of word vectors by capturing the semantic information of malware and their relationships.

This word embedding technology is used to solve too many new word problems, so that we can obtain representations of these unseen words, reducing the impact of the obfuscation technique and facilitating the promotion of the representations.

The proposed neural joint model (BIHAD) can automatically identify and learn the semantics and sequential patterns and do not rely on any other complex or expert features for the learning task.

3 Methodology

3.1 The overview of BIHAD model

We proposed a malware detection algorithm combined the improved IndRNN, Hierarchical Attention [15] and DenseNet [16] to form a neural joint model (BIHAD). First, we used BiIndRNN to encode pixel information from gray-scale image into a texture-level representations. However, texture features are not able to exploit semantic distribution information and the character embeddings is not able to exploit long sequences information and ignores word boundaries. It is necessary to consider word embeddings. Therefore, texture representations, word embeddings and character embeddings are fed into a Hierarchical Attention (HA) Dense Convolutional Network (DenseNet) to learn the representations of behavior information contained in a malware. Specifically, the BiIndRNN and HADenseNet in the proposed framework are integrated to form a pipeline, i.e. the output vectors of IndRNN [17] units in the BiIndRNN are used as the input vectors of HADenseNet. Moreover, the parameters of BiIndRNN are shared by both the BiIndRNN and HADenseNet networks, so they are jointly affected during training. Figure 1 provides an overview of our architecture. The related source code is released on our github page².

Fig.1

An overview of BIHAD architecture.

This remaining content of this section is arranged as follows: Section 3.2 mainly introduces the feature model, which maps the behavior information of malware into texts and then converts them to gray-scale images. The details of the proposed BIHAD model will be given in Section 3.3.

3.2 Feature model

3.2.1 Feature extract

Malicious activities are usually reflected in specific patterns and combinations of various features. There are distinct patterns or combinations of features extracted between malware and benign software. We try to bypass complex code analysis and use a lightweight static analysis to extract features to reflect this difference. We designed the feature model with three necessary elements, including permissions, components and suspicious API calls. This strategy allows us to obtain good performances while keeping the complexity as low as possible.

Permission features: Malicious behaviors of an application require certain permissions to launch goals. It means that permissions defined in an application can imply the latent malicious behaviors of an application. In fact, some experienced anti-malware developers can roughly identify malware based on the list of requested permissions.

Component features: Application components are the essential building blocks of an Android application, which includes activity, service, content providers and broadcast receivers. The name of each component is extracted and gathered in a component feature set as these features may be useful in identifying well-known components of Android malware.

Suspicious API call features: Certain API calls are frequently found in malware, which makes malware to access sensitive data possible. As such, these calls may imply the malicious behaviors of an application and the coding habits of the developers. The suspicious API calls introduced in [18] are extracted into the selected API feature set, which may be useful for distinguishing the benign application from the malware.

These features were extracted and stored in a text database comprising of all observable strings. Table 1 lists the representative observable strings extracted from an application. All these information are helpful for effectively identifying malware. As a result, we extract words, characters and texture features from a set of observable strings and embed them into a joint vector space to fully describe the behavioral characteristics of a malware. This representation enables us to automatically identify the combinations and patterns of malware features.

Table 1
Representative observable strings extracted from an application

Permissions INTERNET

READ_PHONESTATE

RECEIVE_SMS

ACCESS_NETWORK_STATE

CHANGE_WIFI_STATE

Components com.google.update.Receiver

com.google.update.Dialog

com.google.update.UpdateService

com.google.ads.AdActivity

Suspicious API calls sendTextMessage

getNetworkInfo

getDeviceId

setWifiEnabled

Permissions	INTERNET
Components	com.google.update.Receiver
	com.google.update.Dialog
	com.google.update.UpdateService
	com.google.ads.AdActivity
Suspicious API calls	sendTextMessage
	getNetworkInfo
	getDeviceId
	setWifiEnabled

3.2.2 Gray-scale image

In order to better describe the malware behaviors, we explain the texture features in particular. Note that gray-scale images in traditional malware analysis methods allow researchers to understand the structure of malware files without disassembling. Previous work [19, 20] has proven the effectiveness of gray-scale images for the neural network model. A typical approach is to reformat the malware binary file as a sequence of 8-bit strings, and each 8-bit further read as an unsigned integer. Compared with previous work, the purpose of this paper is to map a set of strings consisting of malware behavior features into a gray-scale image. Each character in a set of strings can be reformatted as a decimal-encoded representation of an integer value of ASCII (in the range [0,255]). In particular, the vector is set to a fixed-line width so that the entire file ends up in a two-dimensional array. This visualization of this two-dimensional array helps malware researchers intuitively understand the spatial information of malware and the behavioral characteristics of sequential patterns. Figure 2 illustrates the visualization process of extracting pixel information from a gray-scale image and encoding them into a vector representation at pixel-level.

Fig.2

Android malware visualization process.

3.3 The BIHAD-based approach

3.3.1 The improved Independently Recurrent Neural Network (IndRNN)

Li et al. [17] proposed a new type of recurrent neural network (RNN), referred to as independently recurrent neural network (IndRNN), which has been demonstrated to be effective on multiple fundamental tasks. However, the standard IndRNN often ignore future context information in the processing sequences. One obvious solution is to add future information to predict the output together. This is very useful for us to access previous and future context information. Therefore, we further proposed a new type of IndRNN, referred to as bidirectional independently recurrent neural network (BiIndRNN), to improve the modeling ability for integrating forecast knowledge.

3.3.2 The BiIndRNN for texture level representations

The texture is a description of the spatial distribution pattern, which includes important text structure and position distribution between characters. In order to improve the information representation ability of images, based on the texture similarity of the same type of malware, we used the BiIndRNN model to extract the texture features from a gray-scale image. The extraction process of the texture feature is shown in Fig. 3.

Fig.3

The BiIndRNN for texture-level representations.

Given a gray-scale image g = (P₁, P₂, . . . , P_N), p_i denotes its i-th pixel and Emb (P_i) denotes the embedding of this pixel. To use pixel information, the BiIndRNN layer takes the pixel embeddings as input. The hidden state h of the BiIndRNN unit at time step t can be described as:

$h_{t} = σ ({Wx}_{t} + u ⊙ h_{t - 1} + b)$ (1)

Where W is the input weight and u is the recurrent weight that is diagonalizable. The Hadamard product ⊙ of two matrices u and h_t-1 with the same dimensions can be represented in the form of a matrix product, which is the entry-wise product. For example, if $u = [\begin{matrix} u_{ii} & u_{ij} \\ u_{ji} & u_{jj} \end{matrix}], and h_{t - 1} = [\begin{matrix} h_{ii} & h_{ij} \\ h_{ji} & h_{jj} \end{matrix}]$

Then, $u ⊙ h_{t - 1} = [\begin{matrix} u_{ii} h_{ii} & u_{ij} h_{ij} \\ u_{ji} h_{ji} & u_{jj} h_{jj} \end{matrix}]$

In our implementation, the BiIndRNN contains the forward IndRNN $\vec{h}$ , which reads the normal pixel sequence g from p₁ to p_n, and backward IndRNN $\overset{\leftarrow}{h}$ , which reads from p_n to p₁. An IndRNN unit in the left-to-right direction associates each of Emb (P_i) with a hidden state ${\vec{h}}_{t}$ . For the n-th neuron in the left-to-right direction, the hidden state ${\vec{h}}_{n, t}$ can be obtained as: ${\vec{h}}_{n, t} = σ (W_{n} x_{t} + u_{n} h_{n, t - 1} + b_{n})$ (2)

Where W_n and u_n are the n-th row of the input weight and recurrent weight, respectively. To capture the future information, we also added a corresponding $h_{n, t}^{\leftarrow}$ in the reverse direction. For the n-th neuron in the right-to-left direction, the hidden state $h_{n, t}^{\leftarrow}$ can be obtained as: $h_{n, t}^{\leftarrow} = σ (W_{n} x_{t} + u_{n} h_{n, t + 1} + b_{n})$ (3)

Each neuron in an IndRNN deals with one type of spatial-temporal pattern independently. For each time t, the input Emb (P_i) is provided to IndRNN in opposite directions at the same time, and both one-direction IndRNN determines the output. Thus, we will obtain two vectors for a pixel embedding, the one is the vector from the normal sequence processed in the left-to-right direction by a hidden state ${\vec{h}}_{n}$ , and the other is the vector from the reverse sequence processed in the right-to-left direction by a hidden state $h_{n}^{\leftarrow}$ . In doing so, we concatenate the two vectors to form the output of the BiIndRNN. It can be described as: $h_{t} = [{\vec{h}}_{n}, h_{n}^{\leftarrow}]$ (4)

Where “[]” is the concatenation operation. Each hidden state h_t contains information about the whole input sequence, which focuses on the parts surrounding the t-th pixel of the input image. In this way, the hidden state h_t contains the information of both the preceding pixels and the following pixels and gets a texture-level vector representation T_i at time t.

3.3.3 The HADenseNet for the Android malware detection and attribution

Dense Convolutional Network (DenseNet): Given a sequence W₁/C₁/T₁, W₂/C₂/T₂, . . . , W_N/C_N/T_N, where W_i denotes the i-th word, C_i denotes the i-th character of W_i and T_i denotes the i-th texture-level representation of a gray-scale image, an instance of malware comprises a set of contiguous components in a sequential order, and each component can be represented by an embedding in the order of the mentioned sequence. $V = [Emb (W_{i}), Emb (C_{i}), T_{i}]$ (5)

Where “[]” denotes the vector concatenation. The concatenated vectors are inputted into an improved HADenseNet algorithm module, where a DenseNet is used to extract the context information and retain the correlation information; subsequently, an attention mechanism is used to enhance the probability weight of the target. Figure 4 illustrates this layout schematically.

Fig.4

The HADenseNet for Android malware detection and attribution.

The DenseNet comprises L layers, each of which implements a non-linear transformation H_l (•), where H_ℓ (•) serves as a composite function of three consecutive operations: Batch Normalization (BN), rectified linear units (ReLU) and convolution. The l-th layer has l input, and feature maps are passed on to all L-l subsequent layers. We denote the output of the l-th layer as $x_{l} = H_{l} ([x_{0}, x_{1}, . . ., x_{l - 1}])$ (6)

Where [x₀, x₁, . . . , x_l-1] refers to the concatenation of the feature-maps produced in layers. In addition, the transition layers between two adjacent blocks are referred to change feature-map sizes via convolution and pooling.

Hierarchical Attention (HA): The contribution of each word to the semantic expression of malware is different. As the attention mechanism can enhance the representational ability of key features, we stacked the hierarchical attention mechanism on the top layer of DenseNet. The attention mechanism is introduced to extract such words that are important to the meaning of the semantic expression, and the representations of those informative words are aggregated into vectors. Specifically, we obtained a hidden representation u_i from a one-layer MLP [21 –24], and used the vector v to measure the importance of words. $u_{i} = tanh (W_{s} x_{l, j} + b_{s})$ (7)

The weight a_i of each word x_l,j is computed by $a_{i} = exp (u_{i}^{T} u_{s}) / \sum_{i} exp (u_{i}^{T} u_{s})$ (8) $v = \sum_{i} a_{i} x_{l, j}$ (9)

After the above calculation, we can get the normalized importance weight a_i for each word. The vector v depends on a sequence of words (x_l,1, . . . , x_l,T) in the l-th layer, which computed as a weighted sum of these words. Lastly, the vector v can be invoked as features for malware binary or multiclass classification.

Decision-making: There are two tasks in the decision making module: the detection task for binary classification and the attribution task for multiclass classification.

For the detection task, the Sigmoid function is used as the output layer to model the binary probabilities. $y_{binary} = Sigmoid (v) = 1 / 1 + \exp (- v)$ (10)

The Sigmoid function converted the classification result into label probability: $y_{binary} = {\begin{matrix} 0 & , y \leq threshold \\ 1 & , y \geq threshold \end{matrix}$ (11)

For the attribution task, the Softmax function is used as the output layer to model the multiclass probabilities, i.e. y_multiclass ∈ [0, 1] ^k. $y_{multiclass} = Softmax (Wv)$ (12)

The Softmax function gives a probability distribution over the K classes by exponentiating and normalizing: $P (y | v) = \exp (W_{y}^{T} v) / \sum_{k = 1}^{K} \exp (W_{k}^{T} v)$ (13)

Having the same architecture in both the detection and attribution subtasks makes the development and the evaluation of a given design simpler.

4 Evaluation

In order to prove that the proposed algorithm has good generalization ability, we carried out experimental research from the following aspects: (1) In the case of ensuring that the variable parameters of different models are as consistent as possible, we demonstrated the superiority of the proposed algorithm (BIHAD) in terms of performance. (2) We evaluated the impact of different attention mechanism for the neural joint model performance. (3) We evaluated the efficiency of BIHAD. (4) To highlight the contribution of this study, we compared BIHAD with previous similar studies. (5) We demonstrated that the proposed BIHAD can be effectively applied for large-scale detection. (6) We conducted an additional study to evaluate the resiliency of BIHAD for sophisticated obfuscation schemes. (7) To verify that the proposed BIHAD can reliably detect different malware families, the performance of BIHAD in malware family identification was evaluated.

4.1 Datasets

A key challenge in Android malware detection research is the availability of representative data. Drebin [18] is an Android dataset with 5, 560 malware collected from August 2010 to October 2012. All malware samples are labeled by one of 179 malware families. AMD [25] is the newest dataset with 24,553 malware files collected from 2010 to 2016. All malware samples are categorized into 135 varieties among 71 malware families. Along with these malware datasets, we also collected a lot of real-world Android applications (including 4,664 benign apps) from various resources.

Here we made some comparison between the DREBIN and the AMD for further experiments. Table 2 shows the size of the overall features. Note that even though there are a fixed number of system permissions, but developers can still declare self-defined permissions so that the words in a vocabulary has increased. As described in the previous section, we use the word embedding to count the frequency of occurrence of words in all samples, which focus on words with higher frequency. This makes the words of low frequency almost ignored during the feature extraction phase.

Table 2
Size of extracted feature sets on Drebin and AMD datasets

Feature set Drebin AMD

Pixels 78 64

Words 22998 31440

Characters 64 63

Texture representations 200 200

Feature set	Drebin	AMD
Pixels	78	64
Words	22998	31440
Characters	64	63
Texture representations	200	200

Additionally, AndroZoo (AZ) [26] is an online Android app collection that archives both benign and malicious apps. In order to demonstrate the performance of the proposed method in large-scale detection, we used 110,000 samples from the AZ dataset (55,000 benign apps and 55,000 malware) for further experiments. The Praguard dataset (PG) [27] is an obfuscation benchmark suite, which contains 1,497 obfuscated malware. We used the Praguard dataset to evaluate the resiliency of the proposed method for sophisticated obfuscation schemes.

4.2 Comparison with different algorithms

Comparison with neural networks for Drebin dataset: we compared the BIHAD with other neural joint models which include LSTM+CNN, LSTM+IndRNN, and LSTM+DenseNet. We used 5-fold cross-validation to evaluate the performance against rival methods. All baselines have achieved good performance. As presented in Table 3, the baseline with its accuracy and PRF closest to our proposed method is the LSTM+HADenseNet, which fully demonstrates the powerful generalization ability of LSTM. However, the detection time of LSTM is as much as 2 times slower than the proposed method as shown in Fig. 6. This is because the neurons in the same layer of BiIndRNN are independent of each other and connected across layers, and the gradient can be effectively propagated at different time steps, resulting in faster processing compared to LSTM.

Table 3
Experimental results of neural joint networks comparisons on the two benchmark datasets

Model Drebin dataset AMD dataset

ID ACC P R F ACC P R F

LSTM+CNN A 98.16 98.58 97.78 98.18 97.85 97.68 97.89 97.79

LSTM+IndRNN B 97.54 98.38 96.81 97.59 98.26 98.26 98.04 98.15

LSTM+BiIndRNN C 97.65 96.84 98.29 97.56 98.56 98.71 98.29 98.5

LSTM+DenseNet D 97.87 97.19 98.51 97.84 98.97 99.16 98.75 98.96

LSTM+HADenseNet E 99.08 99.22 98.84 99.03 98.67 98.29 98.92 98.6

BiLSTM+CNN F 97.75 96.38 97.32 96.84 98.87 98.95 98.75 98.85

BiLSTM+DenseNet G 97.56 97.56 98.03 97.55 98.26 98.29 98.08 98.18

BiLSTM+HADenseNet H 98.72 98.55 98.91 98.73 98.56 99.15 97.92 98.53

IndRNN+CNN I 96.89 96.13 97.66 96.89 97.95 97.16 98.45 97.8

IndRNN+DenseNet J 97.75 98.2 97.38 97.79 98.05 97.94 97.72 97.83

IndRNN+HADenseNet K 98.72 98.69 98.81 98.75 98.16 97.9 98.31 98.1

BiIndRNN+CNN L 96.71 96.31 97.03 96.72 98.36 98.53 98.12 98.32

BiIndRNN+DenseNet M 97.56 97.11 98.06 97.58 97.75 96.2 99.38 97.76

BIHAD N 99.51 99.57 99.47 99. 52 99.59 99.37 99.79 99.58

Model	Drebin dataset	AMD dataset
LSTM+CNN	A	98.16	98.58	97.78	98.18	97.85	97.68	97.89	97.79
LSTM+IndRNN	B	97.54	98.38	96.81	97.59	98.26	98.26	98.04	98.15
LSTM+BiIndRNN	C	97.65	96.84	98.29	97.56	98.56	98.71	98.29	98.5
LSTM+DenseNet	D	97.87	97.19	98.51	97.84	98.97	99.16	98.75	98.96
LSTM+HADenseNet	E	99.08	99.22	98.84	99.03	98.67	98.29	98.92	98.6
BiLSTM+CNN	F	97.75	96.38	97.32	96.84	98.87	98.95	98.75	98.85
BiLSTM+DenseNet	G	97.56	97.56	98.03	97.55	98.26	98.29	98.08	98.18
BiLSTM+HADenseNet	H	98.72	98.55	98.91	98.73	98.56	99.15	97.92	98.53
IndRNN+CNN	I	96.89	96.13	97.66	96.89	97.95	97.16	98.45	97.8
IndRNN+DenseNet	J	97.75	98.2	97.38	97.79	98.05	97.94	97.72	97.83
IndRNN+HADenseNet	K	98.72	98.69	98.81	98.75	98.16	97.9	98.31	98.1
BiIndRNN+CNN	L	96.71	96.31	97.03	96.72	98.36	98.53	98.12	98.32
BiIndRNN+DenseNet	M	97.56	97.11	98.06	97.58	97.75	96.2	99.38	97.76
BIHAD	N	99.51	99.57	99.47	99. 52	99.59	99.37	99.79	99.58

In the baseline experiment, we utilized CNN to extract the semantic distribution and spatial information and achieved good results. However, it is not yet possible to compare with the results given by DenseNet. When we use DenseNet, the detection rate (recall) was improved, such as LSTM+DenseNet and IndRNN+DenseNet. We can see that the advantages of DenseNet in dealing with high-dimensional eigenvectors, which ensures that the information flow between layers is maximized, and enhances the propagation and reuse of features, resulting in better performance than other baselines. Moreover, when we stacked hierarchical attention on top of DenseNet, the detection rate of all baselines were improved. Hierarchical attention is used to achieve integration information flows and to avoid loss of semantic information between layers.

Experimental results demonstrated that the attention mechanism is important for revealing malicious behavior patterns. As presented in Table 3, the proposed BIHAD-based produces the best performance under all metrics. It demonstrated the superiority of the proposed method in this paper. Additionally, Fig. 5 shows the training and validation losses over 50 epochs. It is seen that the validation losses are similar to the training losses.

Fig.5

Training and validation loss for the two benchmark datasets.

Comparison with neural networks for AMD dataset: Furthermore, we compared the results produced by different neural joint models using AMD dataset. As seen from Table 3, we obtained the results similar to those on the Drebin dataset, and the proposed BIHAD achieved the best performance for precision with a score of 99.37%. The joint models of BiLSTM + CNN, LSTM+DenseNet and BiLSTM+HADenseNet are closest to our method in precision, which has similar results to what we got with the Drebin dataset. However, they still provide a low detection rate, resulting in bad overall performance. The experimental results demonstrated the complementary modeling power between HADenseNets and BiIndRNN.

In particular, we evaluated the efficiency of BIHAD against rival methods and recorded the overall runtime of BIHAD for classifying 1,865 samples from the Drebin test set and 1,728 samples from the AMD test set. Figure 6 show the overall runtimes of different neural joint models for the classification on two benchmark datasets. The result shows that for the BIHAD model, the overall runtimes for classifying the Drebin test samples and the AMD test samples are only 6.69 seconds and 4.05 seconds, respectively.

Fig.6

The overall runtimes of different neural joint models for classification are compared on the two benchmark datasets.

4.3 Comparison with different attention mechanisms

In this study, we compared three attention mechanisms to verify the effect of different attention mechanisms on performance. The details are given in Table 4. As we can see, self-attention [28] is closest to our results on the Drebin dataset, while it performs poorly on the AMD dataset. The performance of the multi-headed attention [29] on both datasets was stable, but it is still not comparable with the result given by hierarchical attention. This may be explained by the complexity of Self- and Multi-headed attention, which makes the classifier learn too much noise so that the performance is reduced.

Table 4
Experimental results of different attention mechanisms comparisons

Attention P R F

Drebin dataset

BiIndRNN+Multi-DenseNet 97.53 98.5 98.01

BiIndRNN+Self-DenseNet 98.99 98.99 98.99

Ours 99.57 99.47 99.52

AMD dataset

BiIndRNN+Multi-DenseNet 98.45 98.24 98.34

BiIndRNN+Self-DenseNet 98.12 97.72 97.92

Ours 99.37 99.79 99.58

Attention	P	R	F
Drebin dataset
BiIndRNN+Multi-DenseNet	97.53	98.5	98.01
BiIndRNN+Self-DenseNet	98.99	98.99	98.99
Ours	99.57	99.47	99.52
AMD dataset
BiIndRNN+Multi-DenseNet	98.45	98.24	98.34
BiIndRNN+Self-DenseNet	98.12	97.72	97.92
Ours	99.37	99.79	99.58

4.4 Processing time evaluation

In this paper, we evaluated the efficiency of BIHAD in detecting arbitrary malware. On the desktop computer (GeForce GTX 1060 with 8GB RAM) we achieved a remarkable analysis performance. As shown in the previous sections, our system consists of two parts: the feature extraction and the classification. We focused on evaluating the processing time for feature extraction and then giving an average runtime for processing one sample.

In general, the last classification takes much less time than the feature extraction. The results are shown in Fig. 7. For different applications, the classification typically requires a fixed processing time due to the fixed feature space size. The two figures in the first column show the relationship between the apk file size and the runtime of the feature extraction. The two figures in the second column show the relationship between the apk file size and the runtime of classification. The two figures in the third column show the relationship between the apk file size and the overall runtime. As can be seen from the third column, most of the apk files are less than 10MB, and the overall runtime is less than 0.14 seconds. The main contributor to the overall runtime of our system is in the feature extraction phase. Larger apk files require more processing time, which is almost of linear relation. In contrast, the classification phase takes very little time, and most of the sample classification time is less than 0.014 seconds. This is explained by the fact that no matter how large the file is, it is ultimately mapped to a fixed-length vector. Therefore, all samples are processed at similar times in the classification phase.

Fig.7

The overall runtime for processing one sample from two benchmark datasets.

On average, the BIHAD is able to analyze a given application in 0.1279 seconds and 0.1128 seconds for the Drebin dataset and AMD dataset, respectively. Compared to the state-of-the-art method FM [30] (4.7 ms and 0.021 ms for encoding and prediction, respectively), it seems that our system does not have any advantage in processing time. However, this is not the case. To begin with, our system processing one sample for prediction is almost negligible. Secondly, the feature sets used in our system are simpler and smaller than sets used in DREBIN [18] and FM [30], so under the same condition, our system should take less processing time than the two state-of-the-art techniques (DREBIN and FM).

4.5 Comparison with other state-of-the-art methods

In order to highlight the significance of this research result, a comparison is made with previous similar researches for the two benchmark datasets. Table 5 displays the results compared with [18 , 31–33] and [30]. The proposed BIHAD gives a detection rate (recall) of 99.47% on the Drebin dataset that is superior to the state-of-the-art result given in FM [30], which is 99.01%. Moreover, it also gives a better detection rate on the AMD dataset. Experimental results show that the research method in this paper was comparable to other state-of-the-art approaches.

Table 5
Results of our architecture compared to previous findings

Related Works ACC P R F

Drebin dataset

DREBIN [18] 93.9 – – –

Yusof [31] – 93.2 99.4 –

DySign [32] – 94 78 85

apk2vec [33] – 96.59 97.52 96.93

FM [30] – 99.91 99.01 99.46

Ours 99.51 99.57 99.47 99.52

AMD dataset

FM [30] – 99.35 99.20 99.28

Ours 99.59 99.37 99.79 99.58

Related Works	ACC	P	R	F
Drebin dataset
DREBIN [18]	93.9	–	–	–
Yusof [31]	–	93.2	99.4	–
DySign [32]	–	94	78	85
apk2vec [33]	–	96.59	97.52	96.93
FM [30]	–	99.91	99.01	99.46
Ours	99.51	99.57	99.47	99.52
AMD dataset
FM [30]	–	99.35	99.20	99.28
Ours	99.59	99.37	99.79	99.58

4.6 Detection performance evaluation in large-scale datasets

In order to confirm our algorithm is more suitable for large-scale malware detection, we evaluated the performance of BIHAD on AndroZoo dataset (including 110,000 samples). As shown in Table 6, when the number of samples increases, the model fits the data distribution better. However, when the number of samples is too little, the framework becomes under-fitting, which will cause it to fall into a local optimum, resulting in the value of the function being far from the desired target. We think this kind of fluctuation is reasonable. The robustness was evidenced by the small difference in performance metrics for the various sizes of datasets. Overall, the proposed BIHAD always maintains a high F-score (ranging from 98.66% to 99.46%), even in large-scale detection.

Table 6
The effect of dataset size on experimental results

Number ACC P R F

10,000 98.7 98.15 99.17 98.66

20,000 99.15 99.44 98.9 99.17

40,000 99.13 99.59 98.68 99.13

80,000 99.33 99.45 99.27 99.36

110,000 99.42 99.37 99.54 99.46

Number	ACC	P	R	F
10,000	98.7	98.15	99.17	98.66
20,000	99.15	99.44	98.9	99.17
40,000	99.13	99.59	98.68	99.13
80,000	99.33	99.45	99.27	99.36
110,000	99.42	99.37	99.54	99.46

4.7 Obfuscated application detection

We evaluated the effectiveness of our model against obfuscated datasets. As shown in Table 7, our original dataset (OBF0, OBF25, OBF50 and OBF100) consists of samples from various sources (AndroZoo dataset and Praguard dataset). Table 7 also lists the percentage of applications that are obfuscated (obf%). The corresponding obfuscated malware was obtained from the Praguard dataset.

Table 7
The obfuscated datasets used in the study

Datasets Benign apps Malware Obf%

Source #Apps Source #Apps

OBF0 AZ 2,000 AZ 2,000 0%

OBF25 AZ 2,000 AZ,PG 2,000 25%

OBF50 AZ 2,000 AZ,PG 2,000 50%

OBF100 AZ 1,492 PG 1,492 100%

Datasets	Benign apps	Malware	Obf%
OBF0	AZ	2,000	AZ	2,000	0%
OBF25	AZ	2,000	AZ,PG	2,000	25%
OBF50	AZ	2,000	AZ,PG	2,000	50%
OBF100	AZ	1,492	PG	1,492	100%

We obtained an F-score of 99% for the OBF0 dataset (the non-obfuscated set of samples) as shown in Table 8. Moreover, we repeated the experiment with the completely obfuscated set of samples (OBF100) and obtained similar F-score, showing its stability even in the presence of complicated obfuscation. We also evaluated the performance of our model against a mix of the obfuscated dataset (OBF25 and OBF50), as it would be encountered in the real world. In this case, our F-score was 98.68% (25% obfuscated samples) and 98.73% (50% obfuscated samples) respectively. These results show that the emergence of many obfuscation techniques makes it more difficult to classify benign and malicious samples. However, our proposed model still maintains a high detection rate (recall rate), although the overall performance is slightly reduced. We have observed that malware use more obfuscation than benign applications, which could make the sample more easily identifiable as malware.

Table 8

Evaluation of classification on the obfuscated datasets

Dataset	ACC	P	R	F
OBF0	99.5	99.22	99.74	99.48
OBF25	98.62	97.64	99.75	98.68
OBF50	98.75	98.23	99.23	98.73
OBF100	99.49	99.3	99.65	99.47

4.8 Malware family detection

We selected 20 malware families in the Drebin dataset and 10 malware families in the AMD dataset. Furthermore, we evaluated the performance of BIHAD for malware family identification, which is an important task for malware attribution. For this task, the BIHAD is established to identify a certain malware family among samples from other families. We eventually analyzed 4,664 samples on the Drebin dataset and 2,085 samples on the AMD dataset. We grouped them into 20 different malware families and 10 different malware families, respectively. Table 9 and Table 10 illustrates the name, number, and detection performance of each family.

Table 9
Detection performance of per malware family on the Drebin dataset

Family #Num P R F [34]

Adrd 91 0.94 1.00 0.97 0.58

BaseBridge 330 0.96 0.91 0.93 0.95

DroidDream 81 0.50 0.86 0.63 0.44

DroidKungFu 667 0.96 0.95 0.96 0.83

ExploitLinuxLotoor 70 0.86 0.63 0.73 0.62

FakeDoc 132 1.00 0.94 0.97 0.97

FakeInstaller 925 0.98 0.98 0.98 0.86

FakeRun 61 1.00 0.94 0.97 0.97

Gappusin 58 1.00 1.00 1.00 0.43

Geinimi 92 0.89 0.89 0.89 0.73

GinMaster 339 0.90 0.89 0.89 0.97

Glodream 69 0.93 0.90 0.90 0.56

Iconosys 152 1.00 1.00 1.00 0.83

Imlog 43 1.00 1.00 1.00 0.30

Kmin 147 0.82 0.88 0.88 0.82

MobileTx 69 1.00 1.00 1.00 1.00

Opfake 613 0.99 0.98 0.98 0.94

Plankton 625 0.99 0.99 0.99 0.71

SMSreg 41 0.86 0.80 0.80 0.10

SendPay 59 1.00 1.00 1.00 1.00

Average –– 0.96 0.95 0.95 0.73

Family	#Num	P	R	F	[34]
Adrd	91	0.94	1.00	0.97	0.58
BaseBridge	330	0.96	0.91	0.93	0.95
DroidDream	81	0.50	0.86	0.63	0.44
DroidKungFu	667	0.96	0.95	0.96	0.83
ExploitLinuxLotoor	70	0.86	0.63	0.73	0.62
FakeDoc	132	1.00	0.94	0.97	0.97
FakeInstaller	925	0.98	0.98	0.98	0.86
FakeRun	61	1.00	0.94	0.97	0.97
Gappusin	58	1.00	1.00	1.00	0.43
Geinimi	92	0.89	0.89	0.89	0.73
GinMaster	339	0.90	0.89	0.89	0.97
Glodream	69	0.93	0.90	0.90	0.56
Iconosys	152	1.00	1.00	1.00	0.83
Imlog	43	1.00	1.00	1.00	0.30
Kmin	147	0.82	0.88	0.88	0.82
MobileTx	69	1.00	1.00	1.00	1.00
Opfake	613	0.99	0.98	0.98	0.94
Plankton	625	0.99	0.99	0.99	0.71
SMSreg	41	0.86	0.80	0.80	0.10
SendPay	59	1.00	1.00	1.00	1.00
Average	––	0.96	0.95	0.95	0.73

Table 10

Detection performance of per malware family on the AMD dataset

Family	#Num	P	R	F
Boqx	214	1.00	0.98	0.99
DroidKungFu	533	0.99	1.00	0.99
GoldDream	52	0.89	0.94	0.91
Gumen	145	1.00	1.00	1.00
Lotoor	305	0.94	1.00	0.97
Minimob	200	1.00	1.00	1.00
Mseg	234	1.00	1.00	1.00
Mtk	64	1.00	0.86	0.92
SlemBunk	174	1.00	1.00	1.00
SmsKey	164	1.00	0.86	0.93
Average	––	0.98	0.98	0.98

Ideally, multi-family malware corpus stems from a uniform distribution, i.e. each malware family contributes the same number of samples. However, this leads to a distribution that significantly differs from reality, which is difficult to gauge whether such experiments provide statistically reliable results that can generalize. In order to confirm that the BIHAD-based detection scheme proposed in this paper is applicable to the identification of malware families. We evaluated the detection performance of each malware family separately. Experimental results demonstrated that the proposed BIHAD could reliably detect all families with an average F-score of 95% on the Drebin dataset and an average F-score of 98% on the AMD dataset respectively. Seven families on the Drebin dataset and seven families on the AMD dataset can be identified perfectly. Martín et al. [34] proposed a method that combines dynamic analysis and Markov chain for Android malware family attribution. Compared with [34], we reached the best results in the identification of most of the malware families on the Drebin dataset. It is also worth mentioning that, despite the different families identified, we achieved similar results to what was reported by [35] and [36]. It indicated that our approach is comparable to the state-of-the-art approaches.

5 Discussions

For the knowledge representation of words, characters and texture features, a novel Android malware detection solution based on a neural joint model (BIHAD) is proposed. It used different types of feature to construct a stable decision method. Firstly, using image analysis techniques to convert behavior information of malware into texture features. The texture feature reflects the similarity between image blocks, which contain rich spatial and sequential patterns information. Secondly, on the basis of word-level embedding fully expressing the semantic information of the malicious code, the character-level embedding is used to provide information gain for the word-level feature. In order to make full use of behavior information of malware, we implemented the extraction of joint semantic and image information. Lastly, we used the neural joint model to extract the robust features that can improve the classification accuracy. Experimental results demonstrate that the proposed algorithm has certain advantages in performance and runtime.

This study describes a feature model based on static analysis. In future work, we will investigate the effectiveness of dynamic features generated by dynamic analysis. These dynamic features can be easily incorporated into the system so that the learning algorithm can take the advantages of both static and dynamic features. Another important consideration is the modularity of the system. We also plan to explore new network architectures such as replacing components in the network with newer and better-performing versions.

Footnotes

Acknowledgments

The authors would like to thank the Editor-in-Chief, the Associate Editor, and the reviewers for their insightful comments and suggestions. This work was supported by the Research Innovation Project of Graduate Student in Xinjiang Uygur Autonomous Region (XJ2019G065), the Cernet Next Generation Internet Technology Innovation Project (NGII20170420) and the Xinjiang Uygur Autonomous Region Cyber Security and Informatization Project (XJWX-1-Z-2019-1021).

References

Mohd

R.Z.A.

, Zuhairi

M.F.

, Shadil

A.Z.A.

, et al., Anomaly-based NIDS: A review of machine learning methods on malware detection[C], International Conference on Information and Communication Technology, IEEE (2017), 266–270.

Khan

, Abbas

and Al-Muhtadi

, Survey on Mobile User’s Data Privacy Threats and Defense Mechanisms ding73[J], Procedia Computer Science 56(1) (2015), 376–383.

Iyengar

S.S.

, Iyengar

S.S.

, Iyengar

S.S.

, et al., A Survey on Malware Detection Using Data Mining Techniques[J], Acm Computing Surveys 50(3) (2017), 41.

Canfora

, Lorenzo

A.D.

, Medvet

, et al., Effectiveness of Opcode ngrams for Detection of Multi Family Android Malware[C], International Conference on Availability, Reliability and Security, IEEE (2015), 333–340.

Suarez-Tangil

, Dash

S.K.

, Ahmadi

, et al., DroidSieve: Fast and Accurate Classification of Obfuscated Android Malware[C], ACM on Conference on Data and Application Security and Privacy, ACM (2017), 309–320.

Steen

M.V.

, Bos

, Pohlmann

, et al., Prudent Practices for Designing Malware Experiments: Status Quo and Outlook[C], Security and Privacy. IEEE (2012), 65–79.

Hou

, Saas

, Chen

, et al., Deep4MalDroid: A Deep Learning Framework for Android Malware Detection Based on Linux Kernel System Call Graphs[C], Ieee/wic/acm International Conference on Web Intelligence Workshops, IEEE (2017), 104–111.

Navarro

L.C.

, Navarro

A.K.W.

, Grégio

, et al., Leveraging ontologies and machine-learning techniques for malware analysis into Android permissions ecosystems[J], Computers & Security 78 (2018), 429–453.

Chen

, Xue

, Fan

, et al., Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach[J], Computers & Security 73 (2018), 326–344.

10.

, Zhang

, Li

, et al., A Deep Learning Approach to Android Malware Feature Learning and Detection[C], Trustcom/bigdatase/ispa, IEEE (2017), 244–251.

11.

Kalash

, Rochan

, Mohammed

, et al., Malware Classification with Deep Convolutional Neural Networks[C], Ifip International Conference on New Technologies, Mobility and Security, IEEE (2018), 1–5.

12.

Albasir

, James

R.S.R.

, Naik

, et al., Using Deep Learning to Classify Power Consumption Signals of Wireless Devices: An Application to Cybersecurity[C], 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2018), 2032–2036.

13.

Karbab

E.M.B.

, Debbabi

, Derhab

, et al., MalDozer: Automatic framework for android malware detection using deep learning[J], Digital Investigation 24 (2018), S48–S59.

14.

Yuan

, Lu

and Xue

, Droiddetector: Android malware characterization and detection using deep learning[J], Tsinghua Science and Technology 21(1) (2016), 114–123.

15.

Yang

, Yang

, Dyer

, et al., Hierarchical attention networks for document classification[C], Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2016), 1480–1489.

16.

Huang

, Liu

, Maaten

L.V.D.

, et al., Densely Connected Convolutional Networks[C], IEEE Conference on Computer Vision and Pattern Recognition, IEEE Computer Society (2017), 2261–2269.

17.

, Li

, Cook

, Zhu

and Gao

, Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (1) (2018), 5457–5466. DOI: 10.1109/CVPR.2018.00572.

18.

Arp

, Spreitzenbarth

, Hübner

, Gascon

and Rieck

, Drebin: Effective and Explainable Detection of Android Malware in Your Pocket, Proceedings 2014 Network and Distributed System Security Symposium 45(02) (2014), 45-0765-45-0765. DOI: 10.14722/ndss.2014.23247.

19.

, Vasconcellos

V.D.

, Prasad

, et al., Lightweight classification of IoT malware based on image recognition[C], 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), IEEE 2 (2018), 664–669.

20.

Yang

and Wen

, Detecting android malware by applying classification techniques on images patterns[C], IEEE, International Conference on Cloud Computing and Big Data Analysis, IEEE (2017), 344–347.

21.

Rubio

J.J.

, Ricardo Cruz

, Elias

, et al., ANFIS system for classification of brain signals[J], Journal of Intelligent & Fuzzy Systems 2019. DOI: 10.3233/JIFS-190207

22.

de Jesús Rubio

, SOFMLS: online self-organizing fuzzy modified least-squares network[J], IEEE Transactions on Fuzzy Systems 17(6) (2009), 1296–1309.

23.

Giap

C.N.

, Son

L.H.

and Chiclana

, Dynamic structural neural network[J], Journal of Intelligent & Fuzzy Systems 34(4) (2018), 2479–2490.

24.

de Jesús Rubio

, Error convergence analysis of the SUFIN and CSUFIN[J], Applied Soft Computing 72 (2018), 587–595.

25.

Wei

, Li

, Roy

, Ou

and Zhou

, Deep Ground Truth Analysis of Current Android Malware. In Detection of Intrusions and Malware, and Vulnerability Assessment (Vol. 1, (2017), pp. 252–276). DOI: 10.1007/978-3-319-60876-1_12.

26.

Allix

, Bissyandé

T.F.

, Klein

, et al., Androzoo: Collecting millions of android apps for the research community[C], 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR), IEEE (2016), 468–471.

27.

Maiorca

, Ariu

, Corona

, Aresu

and Giacinto

, Stealth attacks: An extended insight into the obfuscation effects on Android malware, Computers and Security (2015), 51(March 2014), 16–31. DOI: 10.1016/j.cose.2015.02.007

28.

Lin

, Feng

, Santos

C.N.

, et al., A structured self-attentive sentence embedding[J]. arXiv preprint arXiv:1703.03130, 2017.

29.

Vaswani

, Shazeer

, Parmar

, et al., Attention is all you need[C], Advances in Neural Information Processing Systems (2017), 5998–6008.

30.

, Zhu

, Niu

, Mills

, Zhang

and Kinawi

, Android Malware Detection based on Factorization Machine. (May). (2018). Retrieved from http://arxiv.org/abs/1805.11843

31.

Yusof

, Saudi

M.M.

and Ridzuan

, A new mobile botnet classification based on permission and API calls[C], Emerging Security Technologies (EST), 2017 Seventh International Conference on. IEEE (2017), 122–127.

32.

Karbab

E.M.B.

, Debbabi

, Alrabaee

, et al., DySign: Dynamic fingerprinting for the automatic detection of Android malware[C], Malicious and Unwanted Software (MALWARE), 2016 11th International Conference on. IEEE (2016), 1–8.

33.

Narayanan

, Soh

, Chen

, et al., apk2vec: Semisupervised multi-view representation learning for profiling Android applications[C], 2018 IEEE International Conference on Data Mining (ICDM). IEEE (2018), 357–366.

34.

Martín

, Rodríguez-Fernández

and Camacho

, CANDYMAN: Classifying Android malware families by modelling dynamic traces with Markov chains[J], Engineering Applications of Artificial Intelligence 74 (2018), 121–13.

35.

Massarelli

, Aniello

, Ciccotelli

, Querzoni

, Ucci

and Baldoni

, Android malware family classification based on resource consumption over time. Proceedings of the 2017 12th International Conference on Malicious and Unwanted Software, MALWARE 2017 (2018), 2018–January, 31–38.

36.

Pektaş

and Acarman

, Learning to detect Android malware via opcode sequences[J], Neurocomputing 2019.

Permissions	INTERNET
	READ_PHONESTATE
	RECEIVE_SMS
	ACCESS_NETWORK_STATE
	CHANGE_WIFI_STATE
Components	com.google.update.Receiver
	com.google.update.Dialog
	com.google.update.UpdateService
	com.google.ads.AdActivity
Suspicious API calls	sendTextMessage
	getNetworkInfo
	getDeviceId
	setWifiEnabled

Combining multi-features with a neural joint model for Android malware detection

Abstract

Keywords

1 Introduction

2 Related work

3 Methodology

3.1 The overview of BIHAD model

3.2.1 Feature extract

3.3.1 The improved Independently Recurrent Neural Network (IndRNN)

3.3.2 The BiIndRNN for texture level representations

4.1 Datasets

Table 2 Size of extracted feature sets on Drebin and AMD datasets Feature set Drebin AMD Pixels 78 64 Words 22998 31440 Characters 64 63 Texture representations 200 200

Table 6 The effect of dataset size on experimental results Number ACC P R F 10,000 98.7 98.15 99.17 98.66 20,000 99.15 99.44 98.9 99.17 40,000 99.13 99.59 98.68 99.13 80,000 99.33 99.45 99.27 99.36 110,000 99.42 99.37 99.54 99.46

Table 7 The obfuscated datasets used in the study Datasets Benign apps Malware Obf% Source #Apps Source #Apps OBF0 AZ 2,000 AZ 2,000 0% OBF25 AZ 2,000 AZ,PG 2,000 25% OBF50 AZ 2,000 AZ,PG 2,000 50% OBF100 AZ 1,492 PG 1,492 100%

Footnotes

Acknowledgments

References

Table 2
Size of extracted feature sets on Drebin and AMD datasets

Feature set Drebin AMD

Pixels 78 64

Words 22998 31440

Characters 64 63

Texture representations 200 200

Table 6
The effect of dataset size on experimental results

Number ACC P R F

10,000 98.7 98.15 99.17 98.66

20,000 99.15 99.44 98.9 99.17

40,000 99.13 99.59 98.68 99.13

80,000 99.33 99.45 99.27 99.36

110,000 99.42 99.37 99.54 99.46

Table 7
The obfuscated datasets used in the study

Datasets Benign apps Malware Obf%

Source #Apps Source #Apps

OBF0 AZ 2,000 AZ 2,000 0%

OBF25 AZ 2,000 AZ,PG 2,000 25%

OBF50 AZ 2,000 AZ,PG 2,000 50%

OBF100 AZ 1,492 PG 1,492 100%