Abstract
The Internet of Things (IoT) is permeating our daily lives through continuous environmental monitoring and data collection. The promise of low latency communication, enhanced security, and efficient bandwidth utilization lead to the shift from mobile cloud computing to mobile edge computing. In this study, we propose an advanced deep reinforcement resource allocation and security-aware data offloading model that considers the constrained computation and radio resources of industrial IoT devices to guarantee efficient sharing of resources between multiple users. This model is formulated as an optimization problem with the goal of decreasing energy consumption and computation delay. This type of problem is non-deterministic polynomial time-hard due to the curse-of-dimensionality challenge, thus, a deep learning optimization approach is presented to find an optimal solution. In addition, a 128-bit Advanced Encryption Standard-based cryptographic approach is proposed to satisfy the data security requirements. Experimental evaluation results show that the proposed model can reduce offloading overhead in terms of energy and time by up to 64.7% in comparison with the local execution approach. It also outperforms the full offloading scenario by up to 13.2%, where it can select some computation tasks to be offloaded while optimally rejecting others. Finally, it is adaptable and scalable for a large number of mobile devices.
Introduction
Today, the Internet of Things (IoT) network technology is fully embraced into virtually every aspect of our lives. Advances in sensor and communication technologies lead to the proliferation of complex, delay- and computation-intensive industrial IoT applications that often generate and process large volumes of data. 1 Such applications include efficient manufacture inspection, virtual/augmented reality, image recognition, internet of vehicles (IoV), and e-Health.2–4 To alleviate the resource constraints of mobile IoT devices and meet the communication/processing delay requirement, complex computations can be offloaded to more resourceful devices. 5
Cloud computing was first exploited as a resource-rich service for mobile devices via the mobile cloud computing (MCC) paradigm. The MCC provides flexible processing, storage, and services capabilities while reducing battery consumption. High latency is considered one of the key challenges facing MCC, especially in real-time and delay-sensitive applications. In addition, security poses a critical challenge that faces MCC, where applications data and services may be vulnerable to many types of attacks during various stages of data transmission and processing. 6
Mobile edge computing (MEC) was recently introduced as a viable and promising solution to address MCC's challenges. In MEC, the computation capabilities of the cloud are pushed to the edge of the radio access network, which is in close proximity to mobile devices, resulting in a cost-efficient and low-latency architecture.7,8 Application domains such as predictive maintenance of industrial machines benefit from the MEC provision to provide fast and highly localized feedback to modify a live representation of the world. 9
Numerous approaches and models for computation offloading in MEC emerged in the literature with the goal of decreasing energy consumption, reducing computation latency, and/or allocating radio resources efficiently.10–12
However, obtaining an optimum offloading solution in complex and dynamic multi-user wireless MEC systems is a challenging task. In addition, the security of data during transmission from mobile devices to edge devices is a challenge due to, for example, sniffing, jamming and eavesdropping attacks. These security threats, especially in a multi-user environment with multi-tasks, have not been addressed in most offloading approaches in the literature.
13
The lack of adequate data protection controls can quickly overshadow the advantages of the MEC paradigm. Motivated by these aforementioned considerations, we present a deep reinforcement learning model to handle performance optimization in multi-user and multi-task MEC systems that are capable of protecting data during edge server transmission. The main contributions of our article are summarized as follows:
Formulating a combination model of computation offloading, security, and resource allocation as an optimization problem with the goal of decreasing the total time and energy overhead of mobile devices. A new security layer is introduced by using the standard 128-bit Advanced Encryption Standard (AES) cryptosystem to safeguard the vulnerability of data during offloading. Transforming the formulated problem into an equivalent form of reinforcement learning, in which all the possible solutions are modeled as state spaces and the movement between different states as actions. Then, a deep-Q-network (DQN)-based algorithm has been proposed for solving this problem and obtaining the near-optimum solution in an efficient way. The simulation results show that our proposed model reduces offloading overhead in terms of energy and time by up to 64.7% in comparison with the local execution approach. In addition, it outperforms the full offloading scenario by up to 13.2% by selecting offloading some tasks while optimally rejecting others. Finally, it is adaptable and scalable for large-scale systems.
The reminder of this study is organized as follows. The related works on offloading strategies are introduced in the Related Work section. In the System Model section, our system model is presented and the formulation of our optimization problem is defined. Then, the DQN-based proposed algorithm is presented in the Problem Solution Using Deep Reinforcement Learning section. The Experimental Evaluation and Analysis section presents the experimental evaluation and discussion. Finally, this study is concluded in the Conclusion section and the future work directions are presented.
Related Work
Numerous optimization models and approaches for computation offloading in MEC environment have been proposed in the literature. Some of these models handle only multi-user single-task MEC systems, 14 whereas others deal with multi-user multi-task environments. 15 In addition, conventional offloading methods such as Lyapunov and convex optimization techniques 16 have been used to solve these models, whereas new algorithms based on artificial intelligence and deep learning have recently emerged.11,17–19 This section will review a brief overview of the common offloading optimization models.
Conventional optimization methods
Minimizing the total consumption of energy under a latency constraint for a multi-user, single-task MEC environment is the objective. 20 The authors formulated an optimization problem to jointly optimize the resources of computation and communication and the decisions of offloading. Further, an efficient algorithm based on the separable semi-definite relaxation approach is developed for obtaining the near-optimum solution for this problem. However, this work neglects the deadline delay requirement for the computation tasks. Tuysuz et al. 21 proposed a novel approach for addressing the video streaming mobility based on the quality of experience (QoE), which can be deployed at the MEC servers. More precisely, this method first generates a session on the basis of QoE level and collects a set of information from the user. Afterward, three core manipulations are performed to maintain the QoE level for each mobile device and to balance the load between mobile users based on user locations and their mobility via handover operations.
Nur et al. 22 applied the caching concept with computation offloading for a multi-user system, in which the application code and their related data for the completed tasks are cached at the edge server for the next execution. To reduce the energy and delay costs, Nur et al. 22 consider the priority for the computation task, which is calculated by task popularity, deadline, data size, and computing resources. Nevertheless, the common drawback 22 is the absence of security mechanisms to protect the application's data from attacks during the transmission.
Dai et al.23,24 have addressed the computation offloading for a multi-user environment with multi-tasks. Specifically, in Dai et al., 23 a new offloading two-tier framework is proposed for a heterogeneous network. An optimization problem is formulated with the aim of decreasing the overall consumption of energy and MEC servers in which computation offloading, user association, allocation of transmission power, and allocation of computation resources are considered. Further, an algorithm is developed to find the optimum offloading decision. However, in Dai et al., 24 the authors have jointly considered the resource allocation and offloading along with mobility factors of vehicular edge computing systems. The load among vehicular edge computing servers is balanced by selecting the optimal offloading decision for the computation tasks, whereas maximizing the system utility is the main goal. However, the main drawback in Dai et al.23,24 is that the security and privacy of data during the offloading process are not considered.
The authors Meng et al. 25 and Elgendy et al. 26 presented solutions to effectively secure applications data on MEC systems for computation offloading. Similarly, Meng et al. 25 presented a secure and efficient offloading framework for MCC, by which the regular renewing of the server key and random padding are jointly combined to protect against timing attacks. In addition, a hybrid and queuing model based on Markov chain is utilized to optimize security and performance. Elgendy et al. 26 introduced a new security layer based on the AES cryptographic algorithm with a genetic algorithm to protect application data during transmission. However, management of offloading and processing in Meng et al. 25 is achieved via the cloud data center, which results in increased delay. However, Elgendy et al. 26 only addressed a multi-user single-task environment and used a computationally prohibitive method for solving the associated offloading problem, especially for large-scale environments.
Deep learning methods
Deep learning algorithms are widely used in offloading for multi-user environments. 11 For example, an offloading scheme based on deep reinforcement learning for devices of IoT was proposed in Min et al. 27 with the goal of minimizing the total system overhead. Specifically, the level of battery, the predicted amount of the consumed energy, and the capacity of the channel are used in the optimal edge server for offloading the computation tasks. Then, a DQN learning-based algorithm is proposed to decrease the dimensionality of the state spaces and to accelerate the learning speed. However, in Min et al., 27 the application data are not protected from cyber attacks during the transmission process.
A stochastic policy of computational offloading for a multi-user and multi-server environment was proposed in Chen et al. 28 In this work, the task arrival, computation resources, and the time-varying communication qualities between mobile users and the edge server are jointly considered. The authors formulated a Markov decision process as a problem whose aims is to increase the long-term utility performance of the entire system. Then, two efficient algorithms based on double DQN are proposed to address the course of dimensionality. In 2019, 29 Dai et al. proposed a novel artificial intelligence empowered vehicular network architecture for IoV that can intelligently orchestrate the edge computing as well as caching resources. In addition, they jointly formulate the edge computing and caching as a Markov decision process problem and design a Deep Deterministic Policy Gradient algorithm to locate the computation resources in an efficient manner. However, in Dai et al., 29 the popular contents are shared between the vehicles at the edge caching, which are vulnerable to different types of attacks.
More recently, Huang et al. 30 proposed a framework based on deep reinforcement learning for an online computation offloading, where the resource allocation and the offloading decision are jointly formulated as a non-convex problem. The aim is to increase the rate of computation in wireless networks. Then, a deep reinforcement learning-based online algorithm is developed for solving this problem via decomposing it into two sub-problems, namely, the decision of offloading and allocation of resources. In addition, for rapid algorithm convergence, an order-preserving quantization method and an adaptive procedure are designed. Meanwhile, a multi-user with a multi-task offloading model for IoT was proposed in Lu et al., 31 in which the latency of service, energy consumption, and success rate of the task are jointly formulated to enhance the QoE-oriented computation offloading. A Double-Dueling-Deterministic Policy Gradients algorithm is developed for solving this problem and deriving the optimal computation offloading. However, the common drawback of Huang et al. 30 and Lu et al. 31 is the absence of security mechanisms to protect applications' data from attacks during the transmission.
Table 1 summarizes the mentioned related works and their main drawbacks. It is evident from the literature review that computation offloading was investigated for a multi-user environment in which conventional methods and deep learning are used to solve these problems. However, handling the security issue in a MEC system, especially a multi-user environment with a multi-task is not addressed. In this class of systems, most mobile applications send multimedia services and generate substantial data that may be offloaded via the mobile networks. This motivates this study of jointly considering the resource allocation challenge and offloading for an environment of a multi-user and with a multi-task. In addition, we attempt to address the data security requirement during transmission to protect against various types of attacks.
Related work models
QoE, quality of experience; RAN, Radio Access Network.
System Model
We study a multi-user MEC system with a single wireless base station and N mobile devices, represented by a set

An illustration of the assumed system model. MEC, mobile edge computing.
The next subsections present the modeling of communication, computation, and security followed with more details on the formulation of our optimization problem.
Communication model
The assumed environment has a set of
We denote
Subsequently, in the offloading case, the data rate of uplink for the user i can be expressed as follows:
where B and pi denote the channel bandwidth and the power of user i transmission and gi and
Consequently, the simultaneous offloading of mobile devices is limited by the following bandwidth constraints:
where R denotes the total available uplink data rate.
In this study, an Orthogonal Frequency Division Multiple access method is considered for addressing the transmission of multi-users at the same cell where the uplink transmission interference of intra-cellular is significantly reduced. 26 Further, the consumption overhead for transmitting the result is neglected due to the small output size (result) of the computation task in comparison with the input data size. 32
Computation model
This section presents the computation model for our system model that is composed of N mobile devices, and each device has M intensive computation tasks that need to be completed. We use a tuple
In the following subsections, the computation overhead for local and edge server computing approaches will be introduced with respect to both time of execution and consumption of energy.
Local execution approach
In the local execution approach, each user i decides to execute its task j locally on its computation resources. So, the consumption of energy and time for processing the task j of user i locally can be calculated as follows:
where
Edge server execution approach
In the edge server execution approach, the task j of user i will be transmitted and processed remotely. Therefore, the consumption of energy and time for offloading and executing task j of user i remotely, that is, task transmission and execution, can be calculated as follows:
where
Security model
During offloading of computation tasks and their related data to an edge server, the application data are transmitted over an insecure wireless link. Attackers may sniff, jam, replay, and/or eavesdrop on wireless links, leaving the system vulnerable to data leakage, tampering, publishing, replication, and dissemination attacks. An encryption layer is introduced to reduce risks related to data security. AES with 128-bit data is used to encrypt/decrypt application data during transmission. AES is a standard symmetric cryptography algorithm used in many applications. Besides, it is more efficient in both security and performance, especially in low-energy devices. 34
First, each user receives the offloading decision from the edge server, which determines whether the mobile user will offload their computation task or not. For the offloading decision case, the user is issued with a secret key to encrypt the transmitted data using 128-bit AES before transmitting the encrypted data to the edge server. Afterward, the edge server uses the same key to decrypt the received data and then executes the computation task on this data. Finally, the edge server sends the result back to the user.
We denote
where
Moreover, regarding the security, computation, and communication models, the total consumption of time and energy for processing a tasks j of the user i can be defined as:
where
Finally, from Eqs. (9) and (10), the total time and energy overhead can be calculated as follows:
where
Problem formulation
In this section, an optimization model for a multi-user environment with a multi-task is formulated with the goal of decreasing the total system overhead for users with respect to communication/processing time and energy. The formulation of our optimization problem is given as follows:
The first two constraints are the energy and time limits for each computation task j. C3 and C4 constraints are the uplink data rate capacity and CPU computation capacity of an edge server node, where F is the total CPU resources at each edge server. Finally, constraint C5 ensures that the variable of decision offloading is binary.
Eq. (14) is considered as a linear problem where the optimal solution can be given by obtaining the offloading decision vector's values a. However, as a is considered as a binary variable, the set of feasible and the objective is considered as a non-convex, which makes the solving for this problem difficult, especially for a huge number of users. This is due to the problem of the course of dimensionality, in which problem size increases rapidly as the number of users increase. 35 Therefore, a deep reinforcement learning-based algorithm is proposed to obtain the near-optimum values for a.
Problem Solution Using Deep Reinforcement Learning
Reinforcement learning
Reinforcement learning is considered a variant of machine learning that allows a system to learn how to behave within an unknown dynamic environment and make different decisions in an optimal way without explicitly being programmed or human intervened. Figure 2 shows a general illustration of a reinforcement learning scenario in which the agent, environment, state, action, and reward are considered the main components. It is observed from the figure that, at time step t, the agent receives an observation regarding state st and chooses an action at, which translates the agent from state st to a new state

Reinforcement learning illustration. RL, Reinforcement Learning.
The Q-learning algorithm is one of the most popular reinforcement learning algorithms where its learning method is defined based on recording a Q-value in the form of Q-table. This table declares the state-action pairs in which the row's headers represent the system state S, the column's headers represent the system actions A, and the cell value represents the quality value,
where
Regarding our optimization problem in Eq. (14), the Q-learning algorithm is not considered as effective for obtaining the optimal solution as the complexity of the problem increases rapidly as the number of users and their computation tasks increase; this leads to an increase in the state-action pairs. Moreover, it becomes difficult to store and compute the corresponding Q value for the Q table and solving this problem becomes computationally prohibitive as the number of state-action pairs increases exponentially. 35 Therefore, DQN is considered to handle the Q-learning limitation through estimating the Q-value function instead of storing the Q-table, as shown in the next subsection.
Deep Q-network
DQN is one of the effective reinforcement learning algorithms in which the neural network with parameter

DQN-based MEC system. DQN, deep-Q-network.
In this study, an efficient DQN algorithm is proposed for solving our optimization problem and obtaining the near-optimum offloading decision. This problem is presented in Eq. (14). The optimization problem first needs to be transformed into an equivalent reinforcement learning form, in which all the possible solutions are modeled as state spaces and the movement between different states as actions. In addition, the rewards value can be calculated based on the objective function. Consequently, the state space, actions, and reward for the problem can be defined as follows:
where
In this study, a pre-classification step has been applied on the state space in which the computation tasks that do not satisfy the completion time deadline constraints, that is, (
As shown in Figure 3 and Algorithm (1), the DQN can be used to solve our optimization problem in Eq. (14). First, given state, action, and reward, the evaluation and target Q-network are initialized with random numbers
Experimental Evaluation and Analysis
This section first introduces the experimental setup. Afterward, an extensive discussion on the simulation results is presented to critically assess our proposed model's performance.
Experimental setup
Our simulation is undertaken by using a personal computer, which has an Intel® CPU 3.4 GHz Core(TM) i7-4770 with 16 GB RAM capacity and pre-installed with Python for development. The software environment is TensorFlow and Numpy with preinstalled Python 3.6 on Windows 10 Professional 64-bit.
38
We used two fully connected hidden layers in our algorithm whose structure is listed in Table 2. A multi-user environment with a multi-task is considered in which we have five users. The system bandwidth, noise, and transmission power are set to 20 MHz,
Deep-Q-network structure
Finally, to verify the performance of our algorithm, five different policies are introduced:
Experiment results
Convergence performance
This subsection studies the convergence performance of the proposed algorithm, in which different values of each parameter are tested and the proper value will be selected for the next simulation.
Figure 4 shows the convergence performance in terms of the total cost over different learning rates, in which the learning rate can be used to adapt the updating speed of

Convergence performance under different values of learning rate.
Figure 5 depicts the effects of different memory sizes on the convergence performance. Through the figure, we show that with a smaller value of memory size, the convergence is becoming faster, but a local optimum solution is obtained instead of a global one. Therefore, in the following simulations, the size of replay memory is set to 1024, which is the most appropriate value.

Convergence performance under different values of memory size.
Figure 6 demonstrates the convergence performance of the proposed algorithm over different values of batch size. The batch size determines the experience samples number that is extracted from the memory at each training interval. It is observed from the figure that with the value of 32 for batch size, the convergence rate becomes fast in comparison with the other values. This is due to the direction of gradient descent, which becomes steeper as the size of the batch is small; therefore, the weight of the neural network will be updated faster. Accordingly, the batch size is set to 32 in the next simulations.

Convergence performance under different values of batch size.
System performance
This subsection presents and discusses the simulation results of our proposed model. First, the overhead of processing the computation tasks under the defined five policies over the different value of users is seen in Figure 7. It is demonstrated from the figure that with 3 users, our proposed DQN algorithm's overhead with and without security addition is equal to the full offloading policy and less than the other two policies. In addition, with the increasing of the users' number, our model with and without security addition is able to achieve a lower overhead relative to full offloading policy. This is due to the fact that the shared communication channels are overloaded, which leads to an increase in the communication time with users' number increasing. Moreover, our model can optimally select which computation tasks should be offloaded and which should not while minimizing the total cost of users via the deployment of task offloading and security.

Total cost over different number of mobile users.
Similarly, Figure 8 illustrates the total cost of executing the computation tasks under five different policies versus different data size for each task. As seen in this figure, the total cost of the five policies increases with the increasing size of input data for each task. In addition, our DQN algorithm with and without a security layer outperforms the other policies. Moreover, the full offloading policy curve increases much more rapidly than the other four policies with the increasing size of input data for each task. This is because as the size of data that is transmitted increases, the communication time also increases, which leads to a significant increase in the total cost of the entire system.

Total cost over different data size.
Finally, Figure 9 shows the total overhead of processing the computation tasks for different MEC servers' capacity. It is seen in this figure that the policy of local execution is not impacted by MEC capacity, whereas the total cost of the other policies gradually declines with the MEC' capacity's increase. This is attributed to the shorter time of execution as the users can be allocated more resources, whereas the MEC resources are not used in the policy of local execution.

Total cost over different MEC capacity.
Conclusion
Our study proposed a resource allocation and security-aware data Offloading model for a multi-user environment with a multi-task. A new efficient security layer is introduced by using the standard encryption and decryption of the AES cryptographic algorithm to protect the communicated data against attacks. In addition, a combination model of security, resource allocation, and computation offloading is formulated as a problem with the goal of reducing the total time and energy overhead of mobile users. Further, to practically obtain the optimum solution, an equivalent form of reinforcement learning is given, in which the state space is defined by using all available solutions and the movement between different states is used to define the actions. Then, an efficient algorithm based on DQN has been proposed for solving this problem and obtaining the optimum solution. Simulation results demonstrate that the proposed model can achieve performance gains of up to
In ongoing and future work, a new effective compression layer will be added to our model. This addition will compress the transmission data size to reduce the transmission time and enhance the overall system performance. In addition, mobile users' mobility will be managed in an efficient manner, in which each user can move dynamically among different edge servers within an offloading period.
Footnotes
Authors' Contributions
Conceptualization, I.A.E. and A.M.; methodology, I.A.E. and M.H.; software, H.A.S. and D.U.; validation, M.H., I.A.E., and H.A.S.; formal analysis, M.K. and D.U.; investigation, A.M. and M.K.; writing—original draft preparation, I.A.E. and A.M.; and writing—review and editing, M.H. and D.U. All authors have read and approved the final version of the article.
Author Disclosure Statement
The authors declare no conflict of interest.
Funding Information
This research was funded by the Deanship of Scientific Research at Princess Nourah bint Abdulrahman University through the Fast-track Research Funding Program.
