Advanced Deep Learning for Resource Allocation and Security Aware Data Offloading in Industrial Mobile Edge Computing

Abstract

The Internet of Things (IoT) is permeating our daily lives through continuous environmental monitoring and data collection. The promise of low latency communication, enhanced security, and efficient bandwidth utilization lead to the shift from mobile cloud computing to mobile edge computing. In this study, we propose an advanced deep reinforcement resource allocation and security-aware data offloading model that considers the constrained computation and radio resources of industrial IoT devices to guarantee efficient sharing of resources between multiple users. This model is formulated as an optimization problem with the goal of decreasing energy consumption and computation delay. This type of problem is non-deterministic polynomial time-hard due to the curse-of-dimensionality challenge, thus, a deep learning optimization approach is presented to find an optimal solution. In addition, a 128-bit Advanced Encryption Standard-based cryptographic approach is proposed to satisfy the data security requirements. Experimental evaluation results show that the proposed model can reduce offloading overhead in terms of energy and time by up to 64.7% in comparison with the local execution approach. It also outperforms the full offloading scenario by up to 13.2%, where it can select some computation tasks to be offloaded while optimally rejecting others. Finally, it is adaptable and scalable for a large number of mobile devices.

Introduction

Today, the Internet of Things (IoT) network technology is fully embraced into virtually every aspect of our lives. Advances in sensor and communication technologies lead to the proliferation of complex, delay- and computation-intensive industrial IoT applications that often generate and process large volumes of data.¹ Such applications include efficient manufacture inspection, virtual/augmented reality, image recognition, internet of vehicles (IoV), and e-Health.^2–4 To alleviate the resource constraints of mobile IoT devices and meet the communication/processing delay requirement, complex computations can be offloaded to more resourceful devices.⁵

Cloud computing was first exploited as a resource-rich service for mobile devices via the mobile cloud computing (MCC) paradigm. The MCC provides flexible processing, storage, and services capabilities while reducing battery consumption. High latency is considered one of the key challenges facing MCC, especially in real-time and delay-sensitive applications. In addition, security poses a critical challenge that faces MCC, where applications data and services may be vulnerable to many types of attacks during various stages of data transmission and processing.⁶

Mobile edge computing (MEC) was recently introduced as a viable and promising solution to address MCC's challenges. In MEC, the computation capabilities of the cloud are pushed to the edge of the radio access network, which is in close proximity to mobile devices, resulting in a cost-efficient and low-latency architecture.^7,8 Application domains such as predictive maintenance of industrial machines benefit from the MEC provision to provide fast and highly localized feedback to modify a live representation of the world.⁹

Numerous approaches and models for computation offloading in MEC emerged in the literature with the goal of decreasing energy consumption, reducing computation latency, and/or allocating radio resources efficiently.^10–12 However, obtaining an optimum offloading solution in complex and dynamic multi-user wireless MEC systems is a challenging task. In addition, the security of data during transmission from mobile devices to edge devices is a challenge due to, for example, sniffing, jamming and eavesdropping attacks. These security threats, especially in a multi-user environment with multi-tasks, have not been addressed in most offloading approaches in the literature.¹³ The lack of adequate data protection controls can quickly overshadow the advantages of the MEC paradigm. Motivated by these aforementioned considerations, we present a deep reinforcement learning model to handle performance optimization in multi-user and multi-task MEC systems that are capable of protecting data during edge server transmission. The main contributions of our article are summarized as follows:

Formulating a combination model of computation offloading, security, and resource allocation as an optimization problem with the goal of decreasing the total time and energy overhead of mobile devices.

A new security layer is introduced by using the standard 128-bit Advanced Encryption Standard (AES) cryptosystem to safeguard the vulnerability of data during offloading.

Transforming the formulated problem into an equivalent form of reinforcement learning, in which all the possible solutions are modeled as state spaces and the movement between different states as actions. Then, a deep-Q-network (DQN)-based algorithm has been proposed for solving this problem and obtaining the near-optimum solution in an efficient way.

The simulation results show that our proposed model reduces offloading overhead in terms of energy and time by up to 64.7% in comparison with the local execution approach. In addition, it outperforms the full offloading scenario by up to 13.2% by selecting offloading some tasks while optimally rejecting others. Finally, it is adaptable and scalable for large-scale systems.

The reminder of this study is organized as follows. The related works on offloading strategies are introduced in the Related Work section. In the System Model section, our system model is presented and the formulation of our optimization problem is defined. Then, the DQN-based proposed algorithm is presented in the Problem Solution Using Deep Reinforcement Learning section. The Experimental Evaluation and Analysis section presents the experimental evaluation and discussion. Finally, this study is concluded in the Conclusion section and the future work directions are presented.

Related Work

Numerous optimization models and approaches for computation offloading in MEC environment have been proposed in the literature. Some of these models handle only multi-user single-task MEC systems,¹⁴ whereas others deal with multi-user multi-task environments.¹⁵ In addition, conventional offloading methods such as Lyapunov and convex optimization techniques¹⁶ have been used to solve these models, whereas new algorithms based on artificial intelligence and deep learning have recently emerged.^11,17–19 This section will review a brief overview of the common offloading optimization models.

Conventional optimization methods

Minimizing the total consumption of energy under a latency constraint for a multi-user, single-task MEC environment is the objective.²⁰ The authors formulated an optimization problem to jointly optimize the resources of computation and communication and the decisions of offloading. Further, an efficient algorithm based on the separable semi-definite relaxation approach is developed for obtaining the near-optimum solution for this problem. However, this work neglects the deadline delay requirement for the computation tasks. Tuysuz et al.²¹ proposed a novel approach for addressing the video streaming mobility based on the quality of experience (QoE), which can be deployed at the MEC servers. More precisely, this method first generates a session on the basis of QoE level and collects a set of information from the user. Afterward, three core manipulations are performed to maintain the QoE level for each mobile device and to balance the load between mobile users based on user locations and their mobility via handover operations.

Nur et al.²² applied the caching concept with computation offloading for a multi-user system, in which the application code and their related data for the completed tasks are cached at the edge server for the next execution. To reduce the energy and delay costs, Nur et al.²² consider the priority for the computation task, which is calculated by task popularity, deadline, data size, and computing resources. Nevertheless, the common drawback²² is the absence of security mechanisms to protect the application's data from attacks during the transmission.

Dai et al.^23,24 have addressed the computation offloading for a multi-user environment with multi-tasks. Specifically, in Dai et al.,²³ a new offloading two-tier framework is proposed for a heterogeneous network. An optimization problem is formulated with the aim of decreasing the overall consumption of energy and MEC servers in which computation offloading, user association, allocation of transmission power, and allocation of computation resources are considered. Further, an algorithm is developed to find the optimum offloading decision. However, in Dai et al.,²⁴ the authors have jointly considered the resource allocation and offloading along with mobility factors of vehicular edge computing systems. The load among vehicular edge computing servers is balanced by selecting the optimal offloading decision for the computation tasks, whereas maximizing the system utility is the main goal. However, the main drawback in Dai et al.^23,24 is that the security and privacy of data during the offloading process are not considered.

The authors Meng et al.²⁵ and Elgendy et al.²⁶ presented solutions to effectively secure applications data on MEC systems for computation offloading. Similarly, Meng et al.²⁵ presented a secure and efficient offloading framework for MCC, by which the regular renewing of the server key and random padding are jointly combined to protect against timing attacks. In addition, a hybrid and queuing model based on Markov chain is utilized to optimize security and performance. Elgendy et al.²⁶ introduced a new security layer based on the AES cryptographic algorithm with a genetic algorithm to protect application data during transmission. However, management of offloading and processing in Meng et al.²⁵ is achieved via the cloud data center, which results in increased delay. However, Elgendy et al.²⁶ only addressed a multi-user single-task environment and used a computationally prohibitive method for solving the associated offloading problem, especially for large-scale environments.

Deep learning methods

Deep learning algorithms are widely used in offloading for multi-user environments.¹¹ For example, an offloading scheme based on deep reinforcement learning for devices of IoT was proposed in Min et al.²⁷ with the goal of minimizing the total system overhead. Specifically, the level of battery, the predicted amount of the consumed energy, and the capacity of the channel are used in the optimal edge server for offloading the computation tasks. Then, a DQN learning-based algorithm is proposed to decrease the dimensionality of the state spaces and to accelerate the learning speed. However, in Min et al.,²⁷ the application data are not protected from cyber attacks during the transmission process.

A stochastic policy of computational offloading for a multi-user and multi-server environment was proposed in Chen et al.²⁸ In this work, the task arrival, computation resources, and the time-varying communication qualities between mobile users and the edge server are jointly considered. The authors formulated a Markov decision process as a problem whose aims is to increase the long-term utility performance of the entire system. Then, two efficient algorithms based on double DQN are proposed to address the course of dimensionality. In 2019,²⁹ Dai et al. proposed a novel artificial intelligence empowered vehicular network architecture for IoV that can intelligently orchestrate the edge computing as well as caching resources. In addition, they jointly formulate the edge computing and caching as a Markov decision process problem and design a Deep Deterministic Policy Gradient algorithm to locate the computation resources in an efficient manner. However, in Dai et al.,²⁹ the popular contents are shared between the vehicles at the edge caching, which are vulnerable to different types of attacks.

More recently, Huang et al.³⁰ proposed a framework based on deep reinforcement learning for an online computation offloading, where the resource allocation and the offloading decision are jointly formulated as a non-convex problem. The aim is to increase the rate of computation in wireless networks. Then, a deep reinforcement learning-based online algorithm is developed for solving this problem via decomposing it into two sub-problems, namely, the decision of offloading and allocation of resources. In addition, for rapid algorithm convergence, an order-preserving quantization method and an adaptive procedure are designed. Meanwhile, a multi-user with a multi-task offloading model for IoT was proposed in Lu et al.,³¹ in which the latency of service, energy consumption, and success rate of the task are jointly formulated to enhance the QoE-oriented computation offloading. A Double-Dueling-Deterministic Policy Gradients algorithm is developed for solving this problem and deriving the optimal computation offloading. However, the common drawback of Huang et al.³⁰ and Lu et al.³¹ is the absence of security mechanisms to protect applications' data from attacks during the transmission.

Table 1 summarizes the mentioned related works and their main drawbacks. It is evident from the literature review that computation offloading was investigated for a multi-user environment in which conventional methods and deep learning are used to solve these problems. However, handling the security issue in a MEC system, especially a multi-user environment with a multi-task is not addressed. In this class of systems, most mobile applications send multimedia services and generate substantial data that may be offloaded via the mobile networks. This motivates this study of jointly considering the resource allocation challenge and offloading for an environment of a multi-user and with a multi-task. In addition, we attempt to address the data security requirement during transmission to protect against various types of attacks.

Table 1.

Related work models

Reference	Proposed method	Solving methods	Objective	User		Task		Security
Reference	Proposed method	Solving methods	Objective	Single	Multiple	Single	Multiple	Security
Chen et al.²⁰	Efficient three-step algorithm for optimizing the offloading decisions and the resource allocation	Semidefinite relaxation	Minimize time and energy		✓		✓
Tuysuz et al.²¹	Collaborative QoE-based mobility-aware video streaming scheme	Conventional optimization	Preserve QoE		✓	✓
Nur et al.²²	Priority-based offloading and caching model	Branch and bound	Minimize energy and delay		✓		✓
Dai et al.²³	Two-tier computation offloading framework in heterogeneous networks	Semidefinite relaxation	Minimize energy		✓		✓
Dai et al.²⁴	Jointly optimizing selection offloading decision and computation resource	Semidefinite relaxation	Maximize system utility		✓	✓
Meng et al.²⁵	Secure and cost-efficient offloading approach	Conventional optimization	Performance and security trade-off	✓		✓		✓
Elgendy et al.²⁶	Allocation of resource and offloading with data security	Branch and bound	Minimize time and energy		✓	✓		✓
Min et al.²⁷	Learning-based computation offloading for IoT devices	Deep reinforcement learning	Minimize delay, energy, and task drop loss	✓			✓
Chen et al.²⁸	Stochastic computation offloading policy for a representative mobile user in ultra-dense RAN	Double deep Q-network	Maximize long-term utility performance		✓	✓
Dai et al.²⁹	Artificial intelligence-based architecture for optimizing edge computing and caching resources	Deep reinforcement learning	Maximize system utility		✓	✓
Huang et al.³⁰	Deep reinforcement learning framework for an online computation offloading	Deep reinforcement learning	Maximize computation rate		✓	✓
Lu et al.³¹	QoE model for computation offloading	Double Q-learning and dueling networks	Maximizing QoE		✓		✓
Our Work	Security-aware data offloading and resource allocation model	Deep Q-network	Minimize time and energy		✓		✓	✓

QoE, quality of experience; RAN, Radio Access Network.

System Model

We study a multi-user MEC system with a single wireless base station and N mobile devices, represented by a set $N = {1, 2, \dots, N}$ , as shown in Figure 1. In addition, an edge server is associated with the wireless base station to provide computational and storage services. Further, each mobile device has a set of $ℳ = {1, 2, \dots, M}$ different types of computation tasks' requirements that need to be accomplished locally or will be transmitted and executed remotely through a wireless channel. In our study, a quasi-static approach is assumed in which the number of users does not change through the offloading period, whereas it may vary over different periods.²⁶

FIG. 1.

An illustration of the assumed system model. MEC, mobile edge computing.

The next subsections present the modeling of communication, computation, and security followed with more details on the formulation of our optimization problem.

Communication model

The assumed environment has a set of $N = {1, 2, \dots, N}$ users that are connected to a single wireless base station via a wireless channel. Each mobile device has a set of $ℳ = {1, 2, \dots, M}$ computationally intensive tasks that need to be completed either locally or remotely. Our aim is to reduce the system overhead in terms of communication/processing time and consumption of energy by scheduling and assigning the multi-users' computation tasks to the optimal execution location.

We denote $a_{i, j} \in {0, 1}$ as the decision of offloading for the computation task j of user i. Specifically, $(a_{i, j} = 0)$ indicates that the mobile device i selects to execute its computation task j locally, whereas $(a_{i, j} = 1)$ indicates that the device i selects to transmit and execute its computation task j remotely. So, we define $A = {a_{1, 1}, a_{1, 2}, \dots, a_{N, M}}$ as the profile decision of offloading for users.

Subsequently, in the offloading case, the data rate of uplink for the user i can be expressed as follows: $r_{i} = B l o g_{2} (1 + \frac{p_{i} g_{i}^{2}}{θ_{0} B}),$ (1)

where B and p_i denote the channel bandwidth and the power of user i transmission and g_i and $θ_{0}$ denote the corresponding gain channel and the density of channel power noise.

Consequently, the simultaneous offloading of mobile devices is limited by the following bandwidth constraints:

where R denotes the total available uplink data rate.

In this study, an Orthogonal Frequency Division Multiple access method is considered for addressing the transmission of multi-users at the same cell where the uplink transmission interference of intra-cellular is significantly reduced.²⁶ Further, the consumption overhead for transmitting the result is neglected due to the small output size (result) of the computation task in comparison with the input data size.³²

Computation model

This section presents the computation model for our system model that is composed of N mobile devices, and each device has M intensive computation tasks that need to be completed. We use a tuple ${I_{i, j}, C_{i, j}, τ_{i, j}}$ to represent a computation task requirement in which $I_{i, j}$ , $C_{i, j}$ , and $τ_{i, j}$ denote the input size of data for each task (code and parameters), cycles of Central Processing Unit (CPU) needed to accomplish the task, and the maximum tolerable delay for task j completion of user i. The values of $I_{i, j}$ and $C_{i, j}$ depend on the nature of the application that is obtained by using a program profiler.³³

In the following subsections, the computation overhead for local and edge server computing approaches will be introduced with respect to both time of execution and consumption of energy.

Local execution approach

In the local execution approach, each user i decides to execute its task j locally on its computation resources. So, the consumption of energy and time for processing the task j of user i locally can be calculated as follows: $T_{i, j}^{l} = \frac{C_{i, j}}{f_{i}^{l}},$ (3) $E_{i, j}^{l} = ξ_{i} C_{i, j},$ (4)

where $f_{i}^{l}$ and $ξ_{i}$ denote the computational capability (CPU cycles/seconds) and the CPU cycle's consumed energy of user i.

Edge server execution approach

In the edge server execution approach, the task j of user i will be transmitted and processed remotely. Therefore, the consumption of energy and time for offloading and executing task j of user i remotely, that is, task transmission and execution, can be calculated as follows: $T_{i, j}^{e} = \frac{I_{i, j}}{r_{i}} + \frac{C_{i, j}}{f_{i}^{e}},$ (5) $E_{i, j}^{e} = p_{i} \frac{I_{i, j}}{r_{i}},$ (6)

where $f_{i}^{e}$ denotes the capability of computation for edge (CPU cycles/seconds) that is allocated to each user i. This study assumed that the edge server's computational resources are equally shared between all users.

Security model

During offloading of computation tasks and their related data to an edge server, the application data are transmitted over an insecure wireless link. Attackers may sniff, jam, replay, and/or eavesdrop on wireless links, leaving the system vulnerable to data leakage, tampering, publishing, replication, and dissemination attacks. An encryption layer is introduced to reduce risks related to data security. AES with 128-bit data is used to encrypt/decrypt application data during transmission. AES is a standard symmetric cryptography algorithm used in many applications. Besides, it is more efficient in both security and performance, especially in low-energy devices.³⁴

First, each user receives the offloading decision from the edge server, which determines whether the mobile user will offload their computation task or not. For the offloading decision case, the user is issued with a secret key to encrypt the transmitted data using 128-bit AES before transmitting the encrypted data to the edge server. Afterward, the edge server uses the same key to decrypt the received data and then executes the computation task on this data. Finally, the edge server sends the result back to the user.

We denote $β_{i} \in {0, 1}$ as the decision of security for user i. Specifically, $(β_{i} = 0)$ refers that the computation task's data of a user i will be offloaded without encryption. However, $(β_{i} = 1)$ indicates that the computation task's data of each user i will be encrypted using our security layer before being transmitted to the edge. Therefore, we define $β = {β_{1}, β_{2}, \dots, β_{N}}$ as a security decision profile for all users. Accordingly, the additional time and energy overhead for applying this security layer can be defined as follows: $t_{i, j}^{s e c} = \frac{η_{i, j}}{f_{i}^{l}} + \frac{δ_{i, j}}{f_{i}^{e}},$ (7) $e_{i, j}^{s e c} = ξ_{i} η_{i, j},$ (8)

where $η_{i, j}$ and $δ_{i, j}$ denote the CPU cycles needed for encrypting and decrypting the task's data at user i and edge server, respectively.

Moreover, regarding the security, computation, and communication models, the total consumption of time and energy for processing a tasks j of the user i can be defined as: $T_{i, j} = [(1 - a_{i, j}) T_{i, j}^{l} + a_{i, j} T_{i, j}^{r}],$ (9) $E_{i, j} = [(1 - a_{i, j}) E_{i, j}^{l} + a_{i, j} E_{i, j}^{r}],$ (10)

where $T_{i, j}^{r}$ and $E_{i, j}^{r}$ denote the total time and energy for our model with security consideration, which can be expressed as follows: $T_{i, j}^{r} = [β_{i} (t_{i, j}^{s e c} + T_{i, j}^{e}) + (1 - β_{i}) T_{i, j}^{e}],$ (11) $E_{i, j}^{r} = [β_{i} (e_{i, j}^{s e c} + E_{i, j}^{e}) + (1 - β_{i}) E_{i, j}^{e}],$ (12)

Finally, from Eqs. (9) and (10), the total time and energy overhead can be calculated as follows: $κ_{i, j} = w_{i}^{t} T_{i, j} + w_{i}^{e} E_{i, j},$ (13)

where $w_{i}^{t}$ and $w_{i}^{e} \in [0, 1]$ refer to time and energy consumption parameters for user i. The values for these weights can be adjusted depending on the application preferences, whether time- or energy-sensitive, and subject to meeting user-specific demands. The values of the weights in Eq. (13) can be adjusted during application installation or configuration. For example, $w_{i}^{t} = 0$ and $w_{i}^{e} = 1$ are used in the case of low battery status of user i, whereas $w_{i}^{e} = 0$ and $w_{i}^{t} = 1$ are used when the running application is time sensitive, that is, real-time applications such as video streaming. Consequently, different values of $w_{i}^{e}$ and $w_{i}^{t}$ are set for different objectives.

Problem formulation

In this section, an optimization model for a multi-user environment with a multi-task is formulated with the goal of decreasing the total system overhead for users with respect to communication/processing time and energy. The formulation of our optimization problem is given as follows:

The first two constraints are the energy and time limits for each computation task j. C₃ and C₄ constraints are the uplink data rate capacity and CPU computation capacity of an edge server node, where F is the total CPU resources at each edge server. Finally, constraint C₅ ensures that the variable of decision offloading is binary.

Eq. (14) is considered as a linear problem where the optimal solution can be given by obtaining the offloading decision vector's values a. However, as a is considered as a binary variable, the set of feasible and the objective is considered as a non-convex, which makes the solving for this problem difficult, especially for a huge number of users. This is due to the problem of the course of dimensionality, in which problem size increases rapidly as the number of users increase.³⁵ Therefore, a deep reinforcement learning-based algorithm is proposed to obtain the near-optimum values for a.

Problem Solution Using Deep Reinforcement Learning

Reinforcement learning

Reinforcement learning is considered a variant of machine learning that allows a system to learn how to behave within an unknown dynamic environment and make different decisions in an optimal way without explicitly being programmed or human intervened. Figure 2 shows a general illustration of a reinforcement learning scenario in which the agent, environment, state, action, and reward are considered the main components. It is observed from the figure that, at time step t, the agent receives an observation regarding state s_t and chooses an action a_t, which translates the agent from state s_t to a new state $s_{t + 1}$ on the basis of the policy $π = P (a_{t} | s_{t})$ . Then, the agent obtains a reward r_t and transitions to the state $s_{t + 1}$ on the basis of function of reward and transition probability of state, which are defined as $R (s, a)$ and $P (s_{t + 1} | s_{t}, a_{t})$ , respectively.³⁶ Subsequently, these steps are repeated until the agent reaches the terminal state, where maximizing the expected cumulative rewards is the main goal, which is defined as $R_{t} = \sum_{k = 0}^{\infty} γ^{k} r_{t + k}$ with a discount factor $γ \in [0, 1]$ .

FIG. 2.

Reinforcement learning illustration. RL, Reinforcement Learning.

The Q-learning algorithm is one of the most popular reinforcement learning algorithms where its learning method is defined based on recording a Q-value in the form of Q-table. This table declares the state-action pairs in which the row's headers represent the system state S, the column's headers represent the system actions A, and the cell value represents the quality value, $Q (s, a)$ , of taking an action from that state having a long-term accumulative reward. $Q (s, a)$ is calculated as:

where $Q (s, a)$ and $Q (s', a')$ denote the current and the new Q values for that state and action respectively. In addition, $r (s, a)$ denotes the reward value obtained when selecting the action a at state t. $max Q (s' a')$ denotes the maximum expected future reward obtained given the new state $s'$ and all possible actions at that state. Finally, $α$ and $γ$ denote the learning rate and discount factor, respectively. In this study, the computation offloading decision $a_{i, j}$ is used to represent the state $s = {a_{i, j}}$ whereas the corresponding movement among different states represents the action space A; this will be discussed in greater detail in the following subsection.

Regarding our optimization problem in Eq. (14), the Q-learning algorithm is not considered as effective for obtaining the optimal solution as the complexity of the problem increases rapidly as the number of users and their computation tasks increase; this leads to an increase in the state-action pairs. Moreover, it becomes difficult to store and compute the corresponding Q value for the Q table and solving this problem becomes computationally prohibitive as the number of state-action pairs increases exponentially.³⁵ Therefore, DQN is considered to handle the Q-learning limitation through estimating the Q-value function instead of storing the Q-table, as shown in the next subsection.

Deep Q-network

DQN is one of the effective reinforcement learning algorithms in which the neural network with parameter $ω$ is used to approximate the function of Q-value and to generate the values for action, as shown in Figure 3. For DQN, the state is given as an input for the neural network and the Q-value is generated as the output, for all actions. In addition, $ε$ -greedy strategy is used to select the action. A random action is selected for $ε \in (0, 1)$ , that is, exploration, and $a = a r g {max}_{a_{t}} Q (s (t), a (t); ω)$ for $1 - ε$ probability, that is, exploitation.

FIG. 3.

DQN-based MEC system. DQN, deep-Q-network.

In this study, an efficient DQN algorithm is proposed for solving our optimization problem and obtaining the near-optimum offloading decision. This problem is presented in Eq. (14). The optimization problem first needs to be transformed into an equivalent reinforcement learning form, in which all the possible solutions are modeled as state spaces and the movement between different states as actions. In addition, the rewards value can be calculated based on the objective function. Consequently, the state space, actions, and reward for the problem can be defined as follows:

State: State space S is represented by the computation offloading decision $X = {a_{1, 1}, a_{1, 2}, \dots, a_{N, M}}$ , which is a $1 \times N M$ vector. Therefore, at an arbitrary index t, the system state can be defined as follows:

\begin{matrix} s (t) = & {a_{1, 1} (t), a_{1, 2} (t), \dots, a_{N, M} (t)}, \end{matrix}

(16)

Action: The action space A is represented by the movement between two different states. In addition, in this study, the system action can be defined as an index-selection within the state vector length in which the agent can move from the current state to a specific neighboring state based on the selected index. Specifically, a variable v is defined to denote the index of selection, in which $v = 1, 2, \dots, N M$ , and the action $a (t) = {a_{v} (t)}$ is considered as $1 \times N M$ vector.

Reward: The agent gets a reward $R (s, a)$ , at each step t, on the basis of a state s and after executing an action a, which is considered as a scalar feedback signal for indicating how well the agent is doing. The system state $s (t)$ represents the computation offloading decision, whereas the objective function in our problem, $Z (t)$ , can be derived based on the state $s (t)$ and can be denoted as follows:

Z_{s (t)} (t) = ({a_{i, j} (t)}},

(17)

where ${a_{i, j} (t)}$ is given by the state $s (t)$ according to the definition in Eq. (16). In addition, based on the values of $Z_{s (t)} (t)$ and $Z_{s (t + 1)} (t + 1)$ , the reward of the state-action pair (t),a (t)) is defined as follows:

r_{s (t), a (t), s (t + 1)} = \{\begin{matrix} 1, & Z_{s (t)} (t) > Z_{s (t + 1)} (t + 1) \\ - 1 & Z_{s (t)} (t) < Z_{s (t + 1)} (t + 1) \\ 0 & Z_{s (t)} (t) = Z_{s (t + 1)} (t + 1) \end{matrix}

(18)

In this study, a pre-classification step has been applied on the state space in which the computation tasks that do not satisfy the completion time deadline constraints, that is, ( $T_{i, j}^{l} < = τ_{i, j}$ ), must be forced to execute locally on the mobile device, that is, $a_{i, j} = 0$ .

As shown in Figure 3 and Algorithm (1), the DQN can be used to solve our optimization problem in Eq. (14). First, given state, action, and reward, the evaluation and target Q-network are initialized with random numbers $ω$ and $ω'$ , respectively. Also, the replay memory Y is initialized with a capacity L. Then, for each episode k, an initial state $s_{i n i t}$ is chosen. Afterward, for each time step t and based on the $ε$ strategy, the evaluation network generates a random action $a (t)$ for $ε \in (0, 1)$ probability and $a = a r g {max}_{a_{t}} Q^{p r e} (s (t), a (t); ω)$ for $1 - ε$ probability. Then, on the basis of Eq. (18), the reward $r (t)$ as well as the next state $s (t + 1)$ are obtained. In addition, the transition $(s (t), a (t), r (t), s (t + 1))$ is stored in the experience replay Y. Consequently, for updating the evaluation network, a sample random minibatch of transitions $(s (k), a (k), r (k), s (k + 1))$ is selected from experience replay Y and the predicted and labeled Q values, Q^pre and Q^lab, are calculated, respectively, as $Q (s (t), a (t); ω)$ , $r (t) + γ {max'}_{a} Q^{t a r} (s (t + 1), a' (t); ω')$ by using evaluation and target networks shown in Procedure 1. This study is adopted as a loss function of a neural network that can calculate the loss between predicted and labeled Q values. In addition, Gradient Decent Algorithm³⁷ is used to minimize this value. Finally, the parameter $ω'$ of target network is updated every C steps.

Experimental Evaluation and Analysis

This section first introduces the experimental setup. Afterward, an extensive discussion on the simulation results is presented to critically assess our proposed model's performance.

Experimental setup

Our simulation is undertaken by using a personal computer, which has an Intel^® CPU 3.4 GHz Core(TM) i7-4770 with 16 GB RAM capacity and pre-installed with Python for development. The software environment is TensorFlow and Numpy with preinstalled Python 3.6 on Windows 10 Professional 64-bit.³⁸ We used two fully connected hidden layers in our algorithm whose structure is listed in Table 2. A multi-user environment with a multi-task is considered in which we have five users. The system bandwidth, noise, and transmission power are set to 20 MHz, $- 100$ dBm, and 100 mW, respectively. Each mobile user has a face recognition application as an example that consists of three independent computation tasks, namely, face detection, pre-processing, and feature extraction and classification. For each computation task, the size of data is uniformly distributed in $(0, 10)$ MB, whereas the CPU cycle is set to 1000 cycles/bit. The user's capability is assigned randomly within the ${0.5, 0.6, \dots, 1.0}$ GHz set, in which the heterogeneous capability of computing is considered, whereas the edge server's CPU computational capability is set to $100 G H z$ . We also assume that the channel bandwidth, the transmission power of each device, and background noise are $20 M H z$ , $100 m W$ , and $- 100 d B m$ , respectively. The energy consumption for each mobile device is uniformly distributed within (0, 20 × 10¹¹)J/cycle.³² For the DQN algorithm, the episode, size mini-batch, and replay memory are set to 20,000, 32, and 512. However, the discount factor, learning rate, and $ε -$ greedy values are set to $0.99$ , $0.01$ , and $0.1$ , respectively. The decision weight for execution time and the energy consumption is set to $w_{i}^{e} = w_{i}^{t} = 0.5$ , which means that the user i is focused on the execution energy consumption and time.

Table 2.

Deep-Q-network structure

Number of neurons in the proposed algorithm
Input	1st Hidden	2nd Hidden	Output
15	120	80	15

Finally, to verify the performance of our algorithm, five different policies are introduced:

Unsecure DQN: Our model is applied without security layer addition.

Secure DQN: Our model is applied after adding the security layer.

Local execution: All the computation tasks will be processed locally.

Full offloading: All the computation tasks will be processed remotely.

Random offloading: A random set of computation tasks are selected uniformly to be processed remotely, whereas the remaining tasks will be executed locally.

Experiment results

Convergence performance

This subsection studies the convergence performance of the proposed algorithm, in which different values of each parameter are tested and the proper value will be selected for the next simulation.

Figure 4 shows the convergence performance in terms of the total cost over different learning rates, in which the learning rate can be used to adapt the updating speed of $ω$ . With a learning rate value of $0.01$ , convergence becomes faster than the learning rate of $0.001$ . Convergence speed increases as the value of learning rate increases. However, with a large learning rate value of $0.1$ , convergence speed drops as it falls to a local optimum solution. Therefore, it is important to choose the appropriate learning rate value suitable for specific situations. Accordingly, we set $0.01$ as a learning rate value, which is the most appropriate value.

FIG. 4.

Convergence performance under different values of learning rate.

Figure 5 depicts the effects of different memory sizes on the convergence performance. Through the figure, we show that with a smaller value of memory size, the convergence is becoming faster, but a local optimum solution is obtained instead of a global one. Therefore, in the following simulations, the size of replay memory is set to 1024, which is the most appropriate value.

FIG. 5.

Convergence performance under different values of memory size.

Figure 6 demonstrates the convergence performance of the proposed algorithm over different values of batch size. The batch size determines the experience samples number that is extracted from the memory at each training interval. It is observed from the figure that with the value of 32 for batch size, the convergence rate becomes fast in comparison with the other values. This is due to the direction of gradient descent, which becomes steeper as the size of the batch is small; therefore, the weight of the neural network will be updated faster. Accordingly, the batch size is set to 32 in the next simulations.

FIG. 6.

Convergence performance under different values of batch size.

System performance

This subsection presents and discusses the simulation results of our proposed model. First, the overhead of processing the computation tasks under the defined five policies over the different value of users is seen in Figure 7. It is demonstrated from the figure that with 3 users, our proposed DQN algorithm's overhead with and without security addition is equal to the full offloading policy and less than the other two policies. In addition, with the increasing of the users' number, our model with and without security addition is able to achieve a lower overhead relative to full offloading policy. This is due to the fact that the shared communication channels are overloaded, which leads to an increase in the communication time with users' number increasing. Moreover, our model can optimally select which computation tasks should be offloaded and which should not while minimizing the total cost of users via the deployment of task offloading and security.

FIG. 7.

Total cost over different number of mobile users.

Similarly, Figure 8 illustrates the total cost of executing the computation tasks under five different policies versus different data size for each task. As seen in this figure, the total cost of the five policies increases with the increasing size of input data for each task. In addition, our DQN algorithm with and without a security layer outperforms the other policies. Moreover, the full offloading policy curve increases much more rapidly than the other four policies with the increasing size of input data for each task. This is because as the size of data that is transmitted increases, the communication time also increases, which leads to a significant increase in the total cost of the entire system.

FIG. 8.

Total cost over different data size.

Finally, Figure 9 shows the total overhead of processing the computation tasks for different MEC servers' capacity. It is seen in this figure that the policy of local execution is not impacted by MEC capacity, whereas the total cost of the other policies gradually declines with the MEC' capacity's increase. This is attributed to the shorter time of execution as the users can be allocated more resources, whereas the MEC resources are not used in the policy of local execution.

FIG. 9.

Total cost over different MEC capacity.

Conclusion

Our study proposed a resource allocation and security-aware data Offloading model for a multi-user environment with a multi-task. A new efficient security layer is introduced by using the standard encryption and decryption of the AES cryptographic algorithm to protect the communicated data against attacks. In addition, a combination model of security, resource allocation, and computation offloading is formulated as a problem with the goal of reducing the total time and energy overhead of mobile users. Further, to practically obtain the optimum solution, an equivalent form of reinforcement learning is given, in which the state space is defined by using all available solutions and the movement between different states is used to define the actions. Then, an efficient algorithm based on DQN has been proposed for solving this problem and obtaining the optimum solution. Simulation results demonstrate that the proposed model can achieve performance gains of up to $13.2 %$ and $64.7 %$ of overhead in comparison with full offloading and local execution approaches. In addition, our DQN-based approach was proven to scale well for the networks with a large-scale.

In ongoing and future work, a new effective compression layer will be added to our model. This addition will compress the transmission data size to reduce the transmission time and enhance the overall system performance. In addition, mobile users' mobility will be managed in an efficient manner, in which each user can move dynamically among different edge servers within an offloading period.

Footnotes

Authors' Contributions

Conceptualization, I.A.E. and A.M.; methodology, I.A.E. and M.H.; software, H.A.S. and D.U.; validation, M.H., I.A.E., and H.A.S.; formal analysis, M.K. and D.U.; investigation, A.M. and M.K.; writing—original draft preparation, I.A.E. and A.M.; and writing—review and editing, M.H. and D.U. All authors have read and approved the final version of the article.

Author Disclosure Statement

The authors declare no conflict of interest.

Funding Information

This research was funded by the Deanship of Scientific Research at Princess Nourah bint Abdulrahman University through the Fast-track Research Funding Program.

Abbreviations Used

References

Hammoudeh

, Newman

, Dennett

, et al. Map as a service: A framework for visualising and maximising information return from multi-modalwireless sensor networks. Sensors. 2015; 15:22970–23003.

Pan

, McElhannon

. Future edge cloud and edge computing for internet of things applications. IEEE Internet Things J. 2017; 5:439–449.

Hammad

, Pawiak

, Wang

, Acharya

. Resnet-attention model for human authentication using ecg signals. Expert Syst. 2020; e12547. [Epub ahead of print]; DOI: 10.1111/exsy.12547.

Tuncer

, Ertam

, Dogan

, et al. Ensemble residual network-based gender and activity recognition method with signals. J Supercomput. 2020; 76:2119–2138.

Wan

, Li

, Xue

, et al. Efficient computation offloading for internet of vehicles in edge computing-assisted 5g networks. J Supercomput. 2019; 76:2518âĂŞ2547.

Noor

, Zeadally

, Alfazi

, Sheng

. Mobile cloud computing: Challenges and future research directions. J Netw Comput Appl. 2018; 115:70–85.

Guo

, Liu

. UAV-enhanced intelligent offloading for internet of things at the edge. IEEE Trans Industr Inform. 2020; 16:2737–2746.

Abuarqoub

, Hammoudeh

, Alsboui

An overview of information extraction from mobile wireless sensor networks. In: Internet of Things, Smart Spaces, and Next Generation Networking. Springer, 2012. pp. 95–106.

Khayyat

, Alshahrani

, Alharbi

, et al. Multilevel service-provisioning-based autonomous vehicle applications. Sustainability. 2020; 12:2497–2513.

10.

Cao

, Li

, Cui

, et al. Exploring placement of heterogeneous edge servers for response time minimization in mobile edge-cloud computing. IEEE Trans Industr Inform. 2020; 17:494–503.

11.

Luong

, Hoang

, Gong

, et al. Applications of deep reinforcement learning in communications and networking: A survey. IEEE Commun Surv Tutor. 2019; 21:3133–3174.

12.

Elgendy

, Zhang

W-Z

, Zeng

, et al. Efficient and secure multi-user multi-task computation offloading for mobile-edge computing in mobile IoT networks. IEEE Trans Netw Serv Manag, 2020; 17:2410–2422.

13.

Roman

, Lopez

, Mambo

. Mobile edge computing, fog et al.: A survey and analysis of security threats and challenges. Future Gener Comput Syst. 2018; 78:680–698.

14.

Zhang

, Mao

, Leng

, et al. Energy-efficient offloading for mobile edge computing in 5g heterogeneous networks. IEEE Access. 2016; 4:5896–5907.

15.

Huang

, Feng

, Zhang

, et al. Multi-server multi-user multi-task computation offloading for mobile edge computing networks. Sensors. 2019; 19:1446.

16.

Mach

, Becvar

. Mobile edge computing: A survey on architecture and computation offloading. IEEE Commun Surv Tutor. 2017; 19:1628–1656.

17.

Pawiak

, Abdar

, Pawiak

, et al. DGHNL: A new deep genetic hierarchical network of learners for prediction of credit scoring. Inform Sci. 2020; 516:401–418.

18.

Khayyat

, Elgendy

, Muthanna

, et al. Advanced deep learning-based computational offloading for multilevel vehicular edge-cloud computing networks. IEEE Access. 2020; 8:137052–137062.

19.

Zahera

, Elgendy

, Jalota

, Sherif

. Fine-Tuned Bert Model for Multi-label Tweets Classification. TREC, 2019.

20.

Chen

M-H

, Liang

, Dong

Joint offloading and resource allocation for computation and communication in mobile cloud with computing access point. In: IEEE INFOCOM 2017-IEEE Conference on Computer Communications, IEEE, 2017. pp. 1–9.

21.

Tuysuz

, Aydin

. QoE-based mobility-aware collaborative video streaming on the edge of 5G. IEEE Trans Industr Inform. 2020; 16:7115–7125.

22.

Nur

, Islam

, Moon

, et al. Priority-based offloading and caching in mobile edge cloud. J Commun Softw Syst. 2019; 15:193–201.

23.

Dai

, Xu

, Maharjan

, Zhang

. Joint computation offloading and user association in multi-task mobile edge computing. IEEE Trans Veh Technol. 2018; 67:12313–12325.

24.

Dai

, Xu

, Maharjan

, Zhang

. Joint offloading and resource allocation in vehicular edge computing and networks. In: 2018. IEEE Global Communications Conference (GLOBECOM), IEEE, 2018. pp. 1–7.

25.

Meng

, Wolter

, Wu

, Wang

. A secure and cost-efficient offloading policy for mobile cloud computing against timing attacks. Pervas Mob Comput. 2018; 45:4–18.

26.

Elgendy

, Zhang

, Tian

Y-C

, Li

. Resource allocation and computation offloading with data security for mobile edge computing. Future Gener Comput Syst. 2019; 100:531–541.

27.

Min

, Xiao

, Chen

, et al. Learning-based computation offloading for IoT devices with energy harvesting. IEEE Trans Veh Technol. 2019; 68:1930–1941.

28.

Chen

, Zhang

, Wu

, et al. Optimized computation offloading performance in virtual edge computing systems via deep reinforcement learning. IEEE Internet Things J. 2018; 6:4005–4018.

29.

Dai

, Xu

, Maharjan

, et al. Artificial intelligence empowered edge computing and caching for internet of vehicles. IEEE Wireless Commun. 2019; 26:12–18.

30.

Huang

, Bi

, Zhang

. Deep reinforcement learning for online computation offloading in wireless powered mobile-edge computing networks. IEEE Trans Mob Comput. 2020; 19:2581–2593.

31.

, He

, Du

, et al. Edge QoE: Computation offloading with deep reinforcement learning for internet of things. IEEE Internet Things J. 2020; 7:9255–9265.

32.

Chen

, Jiao

, Li

, Fu

. Efficient multi-user computation offloading for mobile-edge cloud computing. IEEE ACM Trans Netw. 2015; 24:2795–2808.

33.

Lyu

, Tian

. Adaptive receding horizon offloading strategy under dynamic environment. IEEE Commun Lett. 2016; 20:878–881.

34.

Daemen

, Rijmen

. The Design of Rijndael: AES-the Advanced Encryption Standard. Berlin, Germany: Springer Science & Business Media, 2013.

35.

, Gao

, Lv

, Lu

. Deep reinforcement learning based computation offloading and resource allocation for MEC. In: 2018. IEEE Wireless Communications and Networking Conference (WCNC). IEEE, 2018. pp. 1–6.

36.

Sutton

, Barto

. Reinforcement Learning: An Introduction. London, UK: MIT Press, 2018.

37.

Ruder

An overview of gradient descent optimization algorithms. arXiv preprint arXiv: 1609.04747, 2016.

38.

Abadi

, Agarwal

, Barham

, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv: 1603.04467, 2016.