Abstract
Positive developments in smartphone usage have led to an increase in malicious attacks, particularly targeting Android mobile devices. Android has been a primary target for malware exploiting security vulnerabilities due to the presence of critical applications, such as banking applications. Several machine learning-based models for mobile malware detection have been developed recently, but significant research is needed to achieve optimal efficiency and performance. The proliferation of Android devices and the increasing threat of mobile malware have made it imperative to develop effective methods for detecting malicious apps. This study proposes a robust hybrid deep learning-based approach for detecting and predicting Android malware that integrates Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM). It also presents a creative machine learning-based strategy for dealing with unbalanced datasets, which can mislead the training algorithm during classification. The proposed strategy helps to improve method performance and mitigate over- and under-fitting concerns. The proposed model effectively detects Android malware. It extracts both temporal and spatial features from the dataset. A well-known Drebin dataset was used to train and evaluate the efficacy of all creative frameworks regarding the accuracy, sensitivity, MAE, RMSE, and AUC. The empirical finding proclaims the projected hybrid ConvLSTM model achieved remarkable performance with an accuracy of 0.99, a sensitivity of 0.99, and an AUC of 0.99. The proposed model outperforms standard machine learning-based algorithms in detecting malicious apps and provides a promising framework for real-time Android malware detection.
Introduction
Nowadays, the most common way to access the internet is via a mobile device. Mobile internet services include web browsing, online banking, online shopping, social media and online learning [1]. As a result, mobile devices serve an essential purpose and have become an integral part of people’s lives. Since 2012, the Android operating system (OS) has dominated the mobile OS industry [2]. Google’s Android operating system maintained its position as the world’s leading mobile operating system, with more than 2.5 billion active Android devices, including wearables, mobile phones and car implementations. The most widely used operating system, Android, has over 80% of the market shares [3]. Because of simple development platforms and open-source access, about 4 million applications (apps) are accessible for download from the Google Play store [4]. This Android-based embedded OS platform is designed for usage in the Internet of Things (IoT) gadgets that are low on power and have limited storage space.
It is also essential to know that cybercriminals increased by 72% and mobile device vulnerabilities increased by 50% during the COVID-19 epidemic [5]. Cybercriminals are drawn to activities that provide high returns on relatively little investment of time and resources. Thus, Android’s open and flexible platform is more exposed to hackers. Oh et al. [6] describe the top three factors which make Android OS one of the most vulnerable platforms. Firstly, android OS is open source; second, the Google play store has minimal reviews of Apps. And finally, the OS is Compatibility with Apps from third-party vendors. Li et al. [7] demonstrate 97% of all malware currently targets Android apps. Mobile malware can cause harm to a user’s OS through various means, including Trojans, viruses, and others. The analysis of mobile malware can be classified into two distinct categories: static and dynamic analysis [8]. In the static analysis of an Android application package (APK), the APK is decompiled using tools like dex2jar, and then the code’s control flow or data flow can be examined. On the other hand, dynamic analysis involves running the code and identifying the malicious aspect of the application by evaluating multiple execution paths. As most mobile users utilize the Android OS, malware poses a significant threat to the Android ecosystem. Cybersecurity experts specializing in Android put in a lot of work to defend it against malicious attacks.
However, its popularity makes it a more common target for attackers [9]. Extensive research focuses on these issues, statically and dynamically, and chooses various methods to examine android malware. A method with Static analysis is known as "Risk Ranker," which evaluates the behavior of Android apps presented by [10]. However, this method tends to have a high rate of false positives as it predicts behavior rather than actual presence. Another research method was employed to detect malware in apps through dynamic analysis to enhance accuracy. This analysis requires a significant amount of execution path data and code. The risk of malicious programs may be removed by dynamically analyzing such permission. From the literature, precise descriptions of permissions in android apps have been recommended. It offers a method based on machine learning (ML) for determining what kinds of permissions Android apps need [7, 11]. Recently, there has been an increase in the use of ML and deep learning (DL) based techniques for detecting Android malware. Additionally, DL techniques have been integrated into Android malware detection methods to decrease the need for human experts and improve the ability to identify malware. DL-Based algorithms can help with android malware detection because they can effectively analyze patterns and identify anomalies in large amounts of data. They are also capable of adapting to new data and evolving to improve their detection accuracy over time, making them well-suited for detecting new and evolving forms of malware. Additionally, DL-based algorithms can identify subtle relationships between different aspects of malware that might not be easily detectable using traditional methods, improving the overall accuracy of malware detection systems.
This research proposes a hybrid novel DL-based approach that integrates Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks to identify relationships within Android apps. The hybrid model, known as ConvLSTM, extracts spatial features using CNNs and captures temporal dependencies with LSTMs. By integrating these architectures, the model effectively detects Android malware by considering both structural characteristics and behavior patterns exhibited by the applications. The model is trained and evaluated on the well-known and widely used Drebin dataset of real-world Android malicious and benign apps. The results demonstrate the effectiveness of our proposed model in comparison with the state-of-the-art methods for Android malware detection and classification. Our approach does not rely on generating graph embeddings, which makes it more computationally efficient and less complex to implement. With this work, we aim to contribute to the field of DL for Android malware detection by proposing a novel model that leverages the strengths of both CNNs and LSTM, and applying it to the widely used Drebin dataset. The main study’s contributions are these A novel DL-based model called ConvLSTM was developed to extract both temporal and spatial features from the Drebin dataset for Android malware detection. Semi-supervised learning was used as the basis for this model, and the accuracy was enhanced by incorporating the self-attention mechanism. Experiments were conducted to determine the optimal parameter settings for the Android malware detection model using ConvLSTM. The model was compared to other DL-based and traditional malware detection methods and was found to have superior performance. This novel approach is believed to provide an improved mechanism for accurately detecting Android malware apps. Addressing the limitation of previous works and providing a direction for future research in this field.
The following is the paper’s structure: Section 2 provides an overview Section 3 provide the background about the cutting-edge algorithms used in this study. Section 4 explains the research design, including the methods used for data collection and analysis. Section 5 highlights the findings of the study. finally Section 6 concludes the study by summarizing its key findings, highlighting its contributions, and offering recommendations for future work.
Related work
This section presents a comprehensive review of previous research for detecting Android malware using ML and DL-based techniques. Static and dynamic analysis approaches are used for android malware detection. Both strategies are used to train models using feature extraction and have drawbacks. The static analysis technique is less time-consuming and occupies fewer resources than dynamic analysis but is prone to obfuscation [12]. Dynamic analysis consumes more time and resources but is resistant to obfuscation [13].
Static analysis using cutting-edge algorithms
Several ML-based studies used static characteristics to identify Android malware. Arp et al. [14] suggested a support vector machine (SVM) for lightweight on-device detection based on permissions, API requests, network access, and others. Parker et al. [15] designed Blacksamli Dissembler to break down the program into Dalvik bytecode before removing everything except the opcodes, which are kept for study purposes. Yerima et al. [16, 17] suggested a random forest (RF) ensemble learning model and a method for eigenspace analysis. The designed ML-based detection methods relied on API calls, intents, permissions, and embedded instructions. Wang et al. [18] employed the unique used permission (UUP), and special requested permission (URP) are compared the win cases with benign and malicious apps. Varsha et al. [19] used the rotation forest algorithm and the RF method to identify malware and static features derived from program executable files and manifests on three separate datasets. Arianne et al. [20] discuss the use of control flow analysis for Android malware detection, including CFG generators to create control flow graphs of Android apps. The authors discuss the challenges of applying control flow analysis to Android malware and review various techniques that have been proposed to address these challenges in k-nearest neighbor (K-NN) and naive-based (NB) detection systems. Wang et al. [18] used decision trees (DT), SVM, and RF to assess the utility of risky permissions for malware detection. Alazab et al. [21] also suggested using API calls and permissions. These models comprised K-NN, random tree (RT), RF, J48, and NB. RF gave the best result with an accuracy of 94.30% using 10-fold cross-validation. DAPASA et al. [22] built five characteristics that portray invocation patterns from sensitive sub-graphs to identify malware that piggybacked on legitimate programs. ML-based algorithms such as Decision trees, RF, and K-NN are fed the features, with RF for producing the best detection performance. Wang et al. [23] used decision trees, RF, linear SVM, and logistic regression with static analysis to detect malicious apps. For training the ML algorithms, they used static characteristics relevant to the apps and the platform. In reference to the study conducted by Lee et al. [24], the researchers incorporated permissions and API calls as input features and employed a genetic algorithm for the purpose of feature selection. Subsequently, the selected features were subjected to various ML-based algorithms, namely J48, DT, RF, and NB. The experiment was conducted using a total of 6000 Android samples, resulting in an outstanding accuracy of approximately 97%. The focus of the study by Yumlembam et al. [25] revolves around the specific category of malware known as Ransomware. The researchers employed a swarm optimization algorithm to fine-tune the hyperparameters of the classification algorithm. They proposed a method based on the SVM algorithm. To address class imbalance, the collected samples were categorized into ransomware and benign categories, and different oversampling algorithms were applied to achieve a balanced dataset. The results demonstrated that their method achieved excellent performance in the detection of ransomware. Ibrahim et al. [26] proposed a new method for automatic Android malware detection based on static analysis and a functional API DL-based model. The model was trained on a dataset of 14,079 Android apps, including 4,039 malware and 10,040 benign apps. The model achieved a malware detection accuracy of 99.5% and a malware classification accuracy of 97%. Bhat et al. [27] proposed a new method for Android malware detection based on ensemble ML and system calls. The method first extracts system call features from Android apps. These features are then passed to a homogeneous ensemble of classifiers, which are trained on a dataset of known malware and benign apps. The homogeneous ensemble achieves a malware detection accuracy of 98.8%.
Dynamic and hybrid analysis using cutting-edge algorithms
The recent studies for Android malware detection used ML dynamic and hybrid (static + dynamic) characteristics. The research conducted by xie et al. [28] focuses on the extraction of system calls from the CICANDMAL2017 dataset. These extracted system calls are subsequently utilized as inputs for various ML-based algorithms. The results reveal that the KNN and DT algorithms exhibited the highest malware detection rates, as evaluated using the F1-score metric. Specifically, the KNN algorithm achieved a detection rate of 85%, while the DT algorithm achieved a rate of 72%. The research presented by Nguyen et al. [29] introduces a novel dynamic technique for Android malware detection and classification. The technique utilizes a parallel ML-based model that incorporates various classifiers, including J48, KNN, SVM, and random forest. The primary focus is on leveraging dynamic features to enhance the accuracy of the detection and classification process. Zhao et al. developed an innovative AntiMalDroid malware detection framework, as described in [30], which utilizes recorded behavior sequences as features in a Support Vector Machine (SVM). This framework uses dynamic analysis to identify potential threats. DroidDolphin et al. [31] utilized an SVM and obtained features dynamically in their work. Afonso et al. [32] examined naive Bayes, RF, BayesNet TAN, BayesNet K2, IBk (an instance-based classifier), J48, and SVM using dynamic API calls and system call logs. Xiao et al. [33] trained an LSTM model using the series of system calls discovered via dynamic analysis. Applications were run on a genuine mobile device, and 2000 random occurrences were simulated using Monkey Runner. Alzaylaee et al. [34] compared the performances of simple logistic, multilayer perceptron (MLP), J48, RFt, PART, NB, and SVM by utilizing dynamically acquired features on actual phones against emulators. Cai et al. [35] utilized a well-balanced dataset, which formed the basis for their feature set. The features were divided into security, inter-component communication (ICC) intentions, and structure. The foundation of the feature set was method calls and ICC intentions. Ni et al. [36] developed a system for detecting malicious activity in real time that captures user actions, API requests, and permission usage. NB and SVM algorithms were used in their research to detect these run-time features. Mahindru et al. [37] used 11000 Android apps to extract 123 dynamic permissions, which were then put into several individual ML classifiers such as k-star, simple logistic, RF, DT, and NB. In their results, Simple logistic performs better than others, but the malware classification accuracy results show that simple Logistics, J48, DT, and RF accuracy were almost similar. Thangaveloo et al. [38] proposed the DATDroid framework. During the dynamic investigation, the authors kept an eye on system calls, network traffic, CPU, and memory consumption. The "proc/meminfo" directory in the emulator was used to record information about CPU and memory consumption. MARVIN et al. [39] use an ML strategy that combines a static and dynamic feature-based approach (L2 regularized linear classifier and SVM). MARVIN uses a malice score from 0 to 10 to evaluate the risk posed by unknown Android apps. Su et al. [40] performed a hybrid dynamic and static features approach by conducting experiments on 1200 samples (300 malicious and 900 benign). The study evaluated multiple ML-based algorithms, including SVM, J48, K-NN, naive Bayes, and Bayes net. Among these, SVM achieved the highest accuracy of 91.01%.
Manzil et al. [41] proposed a novel feature vector-based ML-based model for Android malware category detection. The model uses a combination of static and dynamic features to train a ML- based classifier. The static features are extracted from the Android apps bytecode, while the dynamic features are extracted from the apps behavior during runtime. The model has been evaluated on a dataset of 10,000 Android apps, including 5,000 malware and 5,000 benign apps. The model achieved a malware detection accuracy of 98.7%. The Farnood et al. [42] propose AIM, a novel Android malware detection system that combines static and dynamic analysis. AIM extracts features through static analysis and uses ML for classification. Dynamic analysis provides additional runtime behavior data, enhancing accuracy. AIM achieved 98.9% malware detection accuracy, surpassing traditional static methods. Its interpretability enables understanding of classification decisions, benefiting future system designs.
Android malware analysis using cutting-edge algorithms
Several recent research studies have explored deep learning techniques for detecting Android malware [43]. This study proposes a DL-based approach for detecting Android malware using static analysis of the APK files. The approach involves extracting various features from the APK files and feeding them into a deep neural network for classification. The authors report high accuracy and low false positive rates in Android malware detection using DL and static analysis [44]. This study also proposes a DL-based approach for detecting Android malware using static analysis of APK files. The approach involves extracting a set of features from the APK files, including permissions, APIs, and bytecode sequences, and feeding them into a CNN for classification [45]. This study, published in 2021, used a DL-based model called a deep belief network (DBN) to detect Android malware. The study found that the DBN achieved high accuracy in detecting Android malware and could detect malware that was not present in the training dataset [46]. These are just a few examples of the many research studies that have explored the use of DL techniques for Android malware detection. It is an active area of research, and new approaches and techniques are being developed and evaluated continuously. In the aforementioned work, Dunmore et al. [47], proposed the sequences of permissions, actions, and APIs from Android apps as a feature vector for training a GAN. The study introduces image-based classification and GAN-based attack techniques, where system calls of API serve as features for generating RGB images. The pix2pix adversarial network is utilized to generate adversarial examples. Şahin et al. [48] introduced a ML-based system for detecting malware, which relies on deep neural networks. The system utilizes features extracted from app permissions, and feature selection is performed using a linear regression-based method, resulting in the identification of 27 effective features for malware detection. The system achieved an impressive F1 score of 0.961. However, it should be noted that due to the high computational requirements of neural networks compared to other ML-based classifiers, the use of a multi-layer perceptron as the classifier type renders the proposed system less suitable as a lightweight solution.
Background of deep learning-based algorithms
This section outlines the deep learning algorithms employed in the study. A DL-based model is an ML model that leverages multiple layers of artificial neural networks to uncover intricate patterns within data. DL models have proven to be particularly effective in tasks that necessitate extracting knowledge from vast amounts of data, including image classification, natural language processing, speech recognition, and malware detection. These models have consistently delivered cutting-edge results in various domains and are widely adopted in industry and research.
Convolutional neural network (CNN) algorithm
CNN is a specialized DL algorithm engineered to handle data arranged in a grid format, such as images. CNNs are comprised of multiple layers of interconnected nodes that extract features from input data. The layers are organized hierarchically, with the lower layers focused on identifying basic features such as edges and corners. In comparison, the higher layers combine these features to recognize more sophisticated patterns and objects. CNNs have gained widespread popularity for various applications, such as computer vision, natural language processing, medical image analysis, and malware detection. They have also been successful in many competitions and benchmarks, including the ImageNet Large Scale Visual Recognition Challenge. To train a CNN, a labeled dataset is used to adjust the weights and biases of the network so that it can accurately recognize patterns in new unseen data. This process is typically done using an optimization algorithm such as stochastic gradient descent and requires significant computing resources. Figure 1 shows the CNN model’s primary architecture for detecting Android malicious apps.

Architecture of a CNN model.
LSTM, a Recurrent Neural Network (RNN), is specially engineered to effectively handle sequential data with long-term dependencies. RNNs generally work by processing sequential data by continually updating the hidden state in accordance with the input and the previous hidden state. This allows the network to capture dependencies between elements in the sequence and make predictions based on those dependencies. LSTMs are particularly well-suited for this task because they have a particular memory cell that can store information for an extended period. This allows the network to retain information from earlier time steps and use it to make predictions later in the sequence. LSTMs are widely used in various natural language processing tasks, such as language translation, modeling, and text classification. They are also used in other domains, such as speech recognition, music generation, and malware detection. The Architecture of the LSTM model is presented Fig. 2.

Architecture of an LSTM model.
Dataset description
This study was collected on the Drebin dataset, which consists of Android malicious and benign apps created by researchers at the University of Göttingen in Germany. [14]. The Drebin dataset consists of 5,560 malware apps and 5,370 benign apps. Each app in the dataset is represented by features that characterize its behavior and characteristics. These features include information about the app’s permissions, API calls, network traffic, and certain code patterns commonly found in malware.
The Drebin dataset has been widely used in research on Android malware detection and has been the subject of numerous studies and evaluations. It is one of the most widely used and well-known datasets for this purpose. The dataset was randomly split into 80% for training and 20% for testing. The training portion was used to train models for Android malware detection, and the testing portion was used to evaluate the models’ accuracy using unseen data. The volume of the dataset is presented in Table 1.
The volume of Dataset
The volume of Dataset
Preprocessing is crucial for handling Android datasets since they have various forms and properties.
We balanced the highly imbalanced Drebin dataset of Android malware and benign applications using undersampling and oversampling techniques. We randomly undersampled the majority class and used the Synthetic Minority Over-sampling Technique (SMOTE) to oversample the minority class. We implemented these techniques using the
Furthermore, The proposed study used Min–Max Normalization Method for data normalization, which involves scaling the values of a dataset by shifting and rescaling them. Min-max normalization, also called min-max scaling or min-max transformation is a method for scaling numerical data to a specific range, such as 0-1 or -1 to 1. It is a common preprocessing step in ML and data analysis. The min-max normalization method scales the data by transforming it to a new range such that the original data’s minimum and maximum values become the new range’s minimum and maximum values, respectively. The formula for min-max normalization is depicted in Equation 1.
The proposed ConvLSTM model was trained and evaluated using the well-known Drebin dataset, considered a de-facto standard dataset in this field. It is worth noting that this dataset was chosen because its contributors officially maintain it, is widely used in literature, and offers a large number of samples and variations. We perform experiments using two deep learning-based models, CNN and ConvLSTM, under the same conditions with the same dataset.
The models were trained and tested using the Keras library, a high-level neural network API is written in Python. The functional API from Keras was utilized for this purpose [49]. It is noteworthy that Keras can operate on top of either TensorFlow or Theano.
Detection of malware using the proposed architecture
A combination of CNNs and LSTM networks results in a powerful ML-based model capable of detecting Android malware. We followed a comprehensive methodology that leveraged the strengths of both networks. The process began with feature extraction, where we used CNN to extract features from an Android app’s raw bytecode and its static structure, such as the manifest file and resource files. These features were then used as input to the LSTM, which learned the temporal dependencies between the features. This allowed the model to capture patterns in the data indicative of malware. The ConvLSTM model was then trained on a labeled dataset of benign and malicious apps, allowing it to learn the relationship between the features and the presence of malware. Finally, the trained model was used to classify new, unseen apps as either benign or malicious based on the extracted features and learned dependencies. This methodology provides a comprehensive solution for Android malware detection, allowing the model to extract both spatial and temporal features from the data and use these features to make predictions about the presence of malware. Figure 3 illustrates the proposed ConvLSTM model [50]. The training dataset was used to train the model, and the hyperparameters were optimized using the Adam optimizer and the validation dataset. We use the python deep learning library Keras [49]. The ConvLSTM deep learning models are implemented to show that ConvLSTM can enhance detection accuracy. Experiments using the traditional DL-based methods of CNN are also conducted for comparison. Different CNN models with varying parameters were applied and analyzed to study the impact of various CNN structures on malware detection. Additionally, the training time is compared between ConvLSTM and CNN models to verify that using ConvLSTM can reduce the training time.

The Architecture of the ConvLSTM model.
The ConvLSTM model’s training and optimization involved utilizing two 1D convolution layers with 32 filters and a 4-size kernel, followed by two dense layers consisting of 256 hidden neurons. The last layer of the model applies the SoftMax activation function, a nonlinear method for multi-class classification. A dropout layer is included to deactivate some neurons in the ConvLSTM network. In contrast, the global max-pooling layer helps prevent over-fitting by retaining the highest value of the learned features. The cross-entropy loss is optimized, and the weights are revised using the Adam optimizer. The parameters of the ConvLSTM model are displayed in Table 2.
Parameters of the ConvLSTM model
The proposed algorithms for detecting Android malware were evaluated for their effectiveness using statistical analysis and calculating key metrics such as Pearson’s Correlation Coefficient (R), Mean Absolute Error (MAE), and Root Mean Square Error (RMSE). The equations for these metrics are provided in the study.
In these equations, yi,exp represents the expected or actual value of the target variable, while yi,pred represents the predicted value of the target variable. In the context of Android malware detection, yi,exp would refer to the actual class label of an Android app (malware or benign), and yi,pred. would refer to the class label predicted by the model. The difference between the actual and predicted values is used to calculate the various performance metrics, such as MSE, RMSE, R,R2. These metrics provide insights into the model’s accuracy in detecting Android malware. True positive TP is the number of correctly classified as positive sentiment samples. False positive FP is the number of samples incorrectly classified as positive sentiment. True negative TN is the number of correctly classified as negative sentiment samples. False negative FN is the number of samples incorrectly classified as negative sentiment.
Experimentation setup
The CNN-based and ConvLSTM were developed applying Python and the Keras packages in collaboration with TensorFlow [51] on an Intel(R) Core(TM) i7-2.2 GHz CPU. Additionally, the tests were conducted on a 4 GB NVIDIA GTX 1080Ti GPU with 32GB of memory storage. The malware detection algorithms were evaluated using a standard dataset of mobile malware. The dataset contains 10,525 applications from 179 different malware families as represented in Table 1. This study divided datasets into two parts, with 80% used for training and 20% used for testing. A proposed approach for separating the training and testing data used a random function. The Android malware datasets weretrain the models in the fitting phase. The test phase aimed to confirm the accuracy of the proposed models using new data. This research considers the best 32 batch size, learning rate of 0.000samples1, and optimization function hyperparameters, as described in Table 2. DL-based models’ exceptional performance ensures the reliable detection of Android malware.
Performance evaluation of proposed hybrid deep learning-based model
Consequently, the ConvLSTM DL-based model performs well on the Drebin dataset for android malware detection. The model can generalize well and predict whether an Android application is malicious. The model’s remarkable accuracy, F1 score, precision, recall, and sensitivity, as well as its low error rates, indicate that it performs exceptionally well on this dataset.
Every experiment was conducted to examine the efficacy of several configurations of proposed techniques for detecting malware and to compare them to well-known DCNN standards. Each architecture’s parameters and activation functions are adjusted throughout training to create a helpful framework. The suggested design is manually modified by fine-tuning its hyper-parameter values to provide accurate predictions. Before the model training, the training parameters are defined to confine the classification model. Each proposed model was evaluated in an environment similar to training and assessment. The outputs of the proposed hybrid model are correlated using a mixed collection of features and other standard methods. Experiments are conducted to identify the best topology for CNN-based standards that perform extraordinarily well on training and validation datasets. All suggested models are trained with the Adam optimizer using a 0.0001 learning rate and a 0.5 momentum constant. Despite overfitting, the Adam optimizer with a training learning rate of 0.001 exhibits remarkable accuracy and small loss values. The Adam optimizer produces a 0.4% difference between training and testing accuracy, and 0.03 is the most negligible generalization difference between testing and validation losses in the model’s design. The algorithm was trained for 100 iterations with 50 steps per epoch, using a batch size of 32 and incorporating L2 Regularization. At the end of the training period,p = 0.5 likelihood of dropout was added to the activations.
Figure 4a–b presents the accuracy and cross-entropy loss of the proposed model for each epoch, along with the percentage of samples that were correctly classified. It provides a quantitative evaluation of the model’s performance. This study evaluates the proposed hybrid DL-based model on CNN and LSTM algorithms. After 100 iterations, the suggested model achieved a training accuracy of 0.996 and a testing accuracy of 0.995. Figure 4a indicating that it can generalize effectively and accurately predict whether an Android app is benign or malicious. The hybrid model performed optimally, achieving the highest accuracy and the lowest cross-entropy loss. The generalization difference between learning and validation (accuracy and loss) should be modest to prevent model over-fitting. Numerous data have shown that integrating feature maps results in outstanding outcomes and outperforms CNN-based methods. The empirical results indicate that the fused ConvLSTM model performs well in android malware detection using the Drebin dataset.

The ConvLSTM architecture for android malware detection was evaluated using two metrics: (a) accuracy and (b) cross-entropy loss. The hybrid model performed optimally, achieving the highest accuracy and the lowest cross-entropy loss.
The training and validation error rate of the proposed hybrid model with softmax layer on the Derbin dataset is graphically represented, as in Fig. 4b. In Fig. 4b, after the 23 th iterations, the training loss decreases continuously up to 0.079 and testing loss 0.152 while the training accuracy remains consistent compared to other Deep CNN models. The ConvLSTM architecture yielded more accurate training and validation results than the CNN architecture. Hence, the empirical finding proves that our model fully fits malware detection in Android OS.
In contrast to accuracy, the model’s efficacy is evaluated using other standard parameters. The overall accuracy, cross-entropy loss, precision, sensitivity, and F1-score for each case of ConvLSTM algorithms are summarized in Table 3 and graphically shown in Fig. 8a.
The proposed ConvLSTM was evaluated based on its training and validation accuracy, precision, recall, F1-Score, and the AUC score, to analyze its performance

a). The results indicate that the accelerated model obtained 0.99 F1-score, 0.99 precision,0.99 recall, MAE of 0.0197, MSE of 0.0143, and RMSE of 0.119, indicating that the model has a low error rate and can produce accurate predictions. b. The CNN-based model achieved F1 score, precision, recall, MAE, MSE, and RMSE rates of the model were 95.8%, 94.8%, and 95.8%, 0.0703, 0.0299, and 0.1729, respectively.
The results indicate that the accelerated model obtained 0.994 F1-score, 0.996 precision, and 0.992 recall, respectively. The F1 score, which measures a model’s ability to balance accuracy and recall, being near to one indicates that the model is highly effective at recognizing genuine positives and true negatives while minimizing false positives and false negatives. The proposed model achieved 0.99 precision, 0.96 sensitivity, and 0.98 F1-score for malicious classification. In contrast, the hybrid model achieved 0.99 precision, 0.99 sensitivity, and 0.99 F1-score in standard cases. The normal instances exhibited the highest precision, sensitivity, and F1-score levels, while the malicious cases exhibited the lowest sensitivity levels. Finally, the model’s sensitivity of 0.994 indicates it can detect malicious apps, indicating its performance. This is a very high sensitivity value, meaning the model has a shallow false negative rate. Also, Table 3 and Fig. 8a illustrate the performance metrics of individual categories of the designed ConvLSTM algorithm. The Android malware detection category was categorized with adequate sensitivity, specificity, and F1-score achieving 0.99, 0.99, and 0.98. The model was assessed for its TP rate and FP rate through the examination of TP (1061), TN (1897), FP (29), and FN (20). It demonstrated high true positive and low false positive results depicted in Fig. 5a. These results indicate that the model performed well in terms of both accuracy and recall, which are necessary measures for evaluating the performance of a malware detection system. Also, the ROC curves are plotted to determine the overall effectiveness between the TP rate and the FP rate in Fig. 5b. The predicted area under the ROC curve (AUC) for ConvLSTM networks was 0.995 showing that the proposed network has a more excellent value than the cutting-edge algorithm. The proposed model has achieved an MAE of 0.019, MSE of 0.014, and RMSE of 0.119, indicating that the model has a low error rate and can produce accurate predictions.

(a), The model has a high TP rate and a low FP rate during the classification of malware detection. The figure indicates the model achieved TP (1061), TN (1897), FP (29), and FN (20). (b) ROC analysis of the designed approach shows that the proposed network has a greater value than the cutting-edge algorithm.
This study uses the Drebin dataset to evaluate a CNN DL-based model for android malware detection. The results of our experiments indicate that the proposed CNN-based model is an effective solution for detecting malware on android devices. In addition, a graphical depiction of the accuracy and cross-entropy (loss) of the CNN classifier throughout the training and validation stages is presented Fig. 6a–b. The model achieved a training accuracy rate of 0.965 and a testing accuracy rate of 0.954 Fig. 6a depicts the accuracy of a CNN model by exhibiting the proportion of correctly recognized samples and providing a quantitative assessment of the model’s performance. Besides that, the CNN-based framework yields training and validation losses of 0.050 and 0.070, respectively, as depicted in Fig. 6b. The validation loss of the model is 0.140, indicating that the model predicted outputs deviate from the actual outputs on the validation set by an average of 0.140, providing a quantitative measure of the model performance on unseen data during the training process. Figure 7a displays the confusion metrics of the CNN method.

Evaluation metrics of malware detection based on standard CNN algorithm. (a) Accuracy (b) cross Entropy Loss.

(a), The model has a high TP rate and a low FP rate during malware detection and classification. The figure indicates the model achieved the model’s obtained TP, TN, FP, and FN values were 1037, 1833, 73, and 64, respectively. (b) ROC analysis of the designed approach shows that the proposed network has a greater value than the cutting-edge algorithm.
Figure 7b demonstrates the performance of the malware detection system, with the depiction of TP, TN, FP, and FN results, visually portraying the effectiveness of the system in identifying malware while minimizing false alarms. Furthermore, the model’s obtained TP, TN, FP, and FN values were 1037, 1833, 73, and 64, respectively. This indicates that the model can correctly identify malware while maintaining a low number of false positives. The model’s specificity is calculated as TN/(TN+FP) = 1833/(1833+73) = 96.13%.
The CNN-based model achieved F1 score, precision, and recall rate of the model were 0.950, 0.948, and 0.958, respectively illustrated in Fig. 8b. These results indicate that the model performed well in terms of both accuracy and recall, which are necessary measures for evaluating the performance of a malware detection system. The model obtained an MAE of 0.070, MSE of 0.029, and RMSE of 0.172, indicating that the model’s predictions agreed with the actual values. The low value of MAE, MSE, and RMSE implies that the model’s prediction errors were small and close to the actual value. Figure 8b the model evaluation metrics MSE, MAE, and RMSE, with the x-axis showing model configurations and the y-axis showing error levels. Comparing predicted and actual values show how well the models predict. The model’s sensitivity is 0.989, indicating a high percentage of accurate optimistic predictions, which indicates its effectiveness.
The results of this study reveal that the proposed CNN deep learning model is a potential approach for detecting malware on Android devices and may be utilized to enhance the security of Android devices, as illustrated in Table 4. To further confirm the efficacy of the proposed model, it is suggested that it can be evaluated on a more extensive and varied dataset. In addition, future research might investigate the feasibility of incorporating the suggested model into existing android security systems. CNN could reliably identify the majority of harmful apps in the dataset, but a surprisingly significant proportion of benign applications were incorrectly categorized as malicious. It may be helpful to consider ways to improve the model’s ability to classify normal applications to reduce false positives correctly. Based on the low rate of false positives, it can be concluded that the support CNN method has good performance.
The proposed models were evaluated for their effectiveness using metrics such as accuracy, precision, recall, F1-Score, and AUC score
Smartphones allow users to access the internet, run applications, and perform various tasks, making them indispensable in modern society. Smartphone sales expanded internationally due to their numerous advantages; by 2022, over 3.8 billion people had utilized them. Due to the increasing number of smartphone users and the complexity and hackability of Android apps, experts and developers in mobile app security face obstacles. Securing sensitive data in mobile apps for digital commerce, business, finance, and online banking is of utmost importance. To ensure the protection of critical mobile network data, it is imperative to assess the data within the application. ML and DL-based techniques are used to recognize and predict Android application vulnerabilities to prevent mobile network security flaws. This study uses ML and DL-based techniques to identify hallmark database anomalies to improve security. The projected framework enhances system security by detecting recently introduced attacks. Due to the complexity of network infrastructure, nonlinear models were recommended in this research to achieve high accuracy.
The empirical finding proclaims the projected hybrid ConvLSTM model achieved remarkable performance with an accuracy of 0.99, a sensitivity of 0.99, and an AUC of 0.99, as depicted in Fig. 9. Additionally, the use of ConvLSTM algorithms in this study demonstrated a high level of accuracy in developing a system that effectively secures smartphones against malware. Table 5 presents a comparison of the performance of the proposed cyber security system for Android OS with other existing systems that employ ML and DL-based algorithms on various Android datasets. The comparison was conducted to validate the performance of our proposed system against other Android security systems. The results indicate that our system achieved an impressive level of accuracy, outperforming current systems in the field.

AUC comparision between the proposed models.
Comparison of proposed study with state-of-the-arts studies used for the android malware detection
In this paper, a security system was constructed by utilizing CNN and LSTM techniques. Based on the positive outcomes of this study, the following conclusions can be inferred. The proposed semi-supervised hybrid algorithm ConvLSTM for Android malware detection showed remarkable results in terms of performance metrics. By combining the strengths of both CNNs and LSTM, the algorithm achieved high precision, recall, F-score, sensitivity, accuracy, and AUC, as well as low MAE and RMSE values and a high R2 value. These results outperformed a standalone CNN and demonstrated exceptional performance compared to state-of-the-art studies. These findings highlight the potential of the proposed hybrid algorithm to be a highly effective solution for Android malware detection and warrant further exploration and development. Based on these results, some suggestions for further work in our research could include the following: Investigate the robustness of the model to different attack types and variations of the same attack type. Investigate the model’s performance when applied to other datasets similar to the Drebin dataset but with different characteristics. Explore the effect of different hyperparameter settings on the model’s performance and try to find the optimal settings for the model. Look into the interpretability of the model and try to understand why the model is making certain predictions. Investigate the scalability of the model and its ability to handle large datasets and large numbers of features. Investigate the model’s performance in real-world scenarios and evaluate its ability to detect unknown or zero-day malware.
Conflict of interest
The authors declare that they have no conflict of interest.
Footnotes
Acknowledgments
The author would like to thank the Department of Computer Science, Lasbela University of Agriculture Water and Marine Sciences, 90150 Lasbela, Pakistan, for their support.
