Abstract
By developing a Turing-complete non-control data attack to bypass existing defenses against control flow attacks, Data-Oriented Programming (DOP) has gained significant attention from researchers in recent years. While several defense techniques have been proposed to mitigate DOP attacks, they often introduce substantial overhead due to the blind protection of a large range of data objects. To address this issue, we focus on selecting and protecting the specific target data that are of interest to DOP attackers, rather than securing the entire non-control data in the program. In this regard, we perform static analysis on 20 real-world applications and identify the target data, verifying that they constitute only a small percentage of the overall program, averaging around 3%. Additionally, we propose a semi-automated tool to analyze how to chain operations on the target data in these 20 applications to achieve Turing-complete attacks. Furthermore, we introduce DSLR-: a low-overhead Data Structure Layout Randomization (DSLR) method, which modifies the existing DSLR technique to only randomize the selected target data for DOP. Experimental results demonstrate that DSLR- effectively mitigates DOP attacks, reducing performance overhead by 71.2% and memory overhead by 82.5% compared to the original DSLR technique.
Introduction
For decades, memory corruption attacks [27,55] have remained a significant security threat in the cyberspace. These attacks exploit software vulnerabilities and require minimal prerequisites to be launched, contributing to their prevalence. Memory corruption attacks can be categorized into two types based on the corrupted data: control-flow hijacking attacks, which manipulate control data (e.g., code injection attacks [16], code reuse attacks [4,60,67]), and data-oriented attacks, which manipulate non-control data (e.g., direct data manipulation (DDM) [9], data-oriented programming (DOP) [25]). However, the widespread adoption of protection mechanisms for control data, such as data execution prevention (DEP) [53], StackGuard [61], address space layout randomization (ASLR) [51], and control-flow integrity (CFI) [24,32], makes it challenging for attackers to manipulate control data. As a result, attackers shift their focus to exploring alternative attack vectors and increase their attention on non-control data. Recent research by Hu et al. [25] reveals that (non-control) data-oriented programming (DOP), which focuses on manipulating non-control data, can also cause significant harm to programs. In advanced attack practices, Turing-complete attacks can be achieved by chaining numerous data-oriented gadgets.
In fact, several defenses can mitigate DOP attacks, for example, data encryption [3,7,57], data structure layout randomization (DSLR) [8,41] and data flow integrity (DFI) [20,43] provide barrier for attackers to identify, locate and manipulate the desired data. Pointer-based bounds checking (PBC) [18,19,45] verifies that access through a pointer is in bounds. Pointer authentication code (PAC) [34,40] protects the integrity of pointers. However, we observe that existing defenses result in high performance overhead due to the large scope of the data to be protected. And the non-negligible overhead prevents them to be widely adopted in production environments. Hence, there is an urgent need for a light-wight and effective protection scheme.
In this paper, we try to propose a target-specific method for data protection, i.e., to protect only the selected data targeted by DOP attacks. Firstly, we collect, classify the target data of the real-world programs and analyze how DOP operates on these data. Specifically, we analyze the LLVM intermediate representation (IR) code of 20 real-world applications, and develop a semi-automated tool to analyze how to chain operations on the target data to achieve Turing-complete attacks. Additionally, we verify that the amount of target data is relatively small, averaging around 3%. Secondly, we propose DSLR–, which modify the existing DSLR tool [41] to randomize the absolute address of target data and relative distances between them. Thirdly, we design experiments to evaluate the effectiveness and overhead of the proposed scheme. The experiments show that randomizing the target data can effectively resist DOP. In addition to this, compared to the original DSLR tool, our randomization tool reduces the performance overhead by 71.2% and the memory overhead by 82.5%.
In conclusion, this paper makes several contributions:
We perform static analysis on 20 real-world applications to identify the target data manipulated by the DOP. We calculate the percentage of these target data and show that target data are only a small part of total data in the program. We propose DSLR– that is able to randomize the absolute address of target data and the relative distance between them. Our experiments verify that our tool can effectively resist DOP and greatly reduce the performance overhead and memory overhead of original DSLR tools. We introduce a semi-automated tool can analyze how to chain operations on target data to achieve Turing-complete attacks, which previously could only be achieved by the researcher’s intuition. Moreover, we analyze 20 real-world applications and find that Openssl [49], libpcap [38] and libpng [39] have exploitable complete DOP attack chains, which is not found in existing researches.
Background
DOP attacks

The process and requirements of DOP attack.
Non-control data attacks hijack data objects in two ways: direct data manipulation (DDM) or using sequences in instructions to construct malicious code (DOP). The goals of these two attacks are basically similar, they both exploit memory vulnerabilities (e.g., buffer overflow vulnerabilities [22], use-after-free [69], double free vulnerabilities [6], etc.) to read and write memory at specified locations. Thus, attackers can manipulate security-critical data to achieve the attacks goals such as elevating privileges or bypassing authentication. However, DDM hardly realizes complex operations until the advent of DOP proves that non-controll data attacks can achieve Turing-complete attacks which allow attackers perform arbitrary computations. In this section, we focus on DOP attacks and show how to deploy DOP. Overall, DOP firstly identifies security-critical data in the program, and then looks for instructions (gadgets) that can manipulate these data. Finally, DOP identifies a dispatcher that can stitch these gadgets to launch the attacker’s malicious behavior. Specifically, DOP’s three steps and requirements are shown in Fig. 1:
Data objects, especially non-controll data objects, are the attack vectors of DOP attacks. Therefore, some defense methods are proposed to break requirements (-).
Theoretically, fine-grained and complete data protection can resist most DOP attacks [25]. However, such protection strategies usually incur extremely high performance overheads, making them impossible to be widely adopted in production environments (we discuss this issue in Section 7). Therefore, this paper tries to find another data protection strategy, i.e., to protect DOP-specific data objects only.
Problem statement
In this paper, we aim to answer the following questions:
Data object analysis
In this section, we analyze two types of data in the program: security-critical data and DOP’s target data. Here, we investigate the difference between security-critical data and target data:
Security-critical data: These data are closely related to the security of the program. Besides, these data are more easier to identify compared with the target data. Therefore, DOP tends to look for instructions around these data when selecting gadgets.
Target data: Attackers can direct or indirect manipulate these data, such as the variables involved in gadgets. The goal of manipulating these data is to manipulate security-critical data. Generally speaking, security-critical data is also part of the target data.
For example, as shown in Fig. 2, this is a code snippet’s call graph of ProFTPD [56]. The value
Security-critical data
How to choose the exploitable security-critical data is often determined by the attackers’ goals and semantics of the data. For example, DOP attacks may aim to leak confidential memory data, which poses a significant security threat to production systems [26]. These data include sensitive information like private keys and passwords. Privilege escalation is another objective for DOP attackers, where they gain unauthorized access to protected resources. This may involve modifying or corrupting data that indicates the user’s access level or permission (e.g., uid, group id), or authorization data (e.g., authorization flags), or data with high security levels (e.g., root directory). Additionally, attackers can construct malicious payloads by tampering with URLs to achieve privilege elevation, as demonstrated in the user input data attack against GHTTPD [9]. Furthermore, attacks can manipulate a program’s file descriptor, a non-negative integer value that points to a records table maintained by the kernel for each process, to exploit open files to perform unauthorized read or write operations on them.
As an example, in Table 1, we list several security-critical variables in ProFTPD [56] that fall under the aforementioned types.
Examples of some security-critical data in ProFTPD’s source code
Examples of some security-critical data in ProFTPD’s source code
Different from DDM which hardly realize complex operations, DOP proves that non-control data attacks can achieve Turing-complete attacks which allow the attacker to perform arbitrary computations. More specifically, in a DOP attack, gadgets with specific operations can simulate a Turing machine, including arithmetic/logical, assignment, load/store, jump operation. Then, a dispatcher combines these gadgets to implement complex behaviour. In this section, we analyze DOP operations on target data according to two granularities: gadgets-level operation and dispatcher-level operation.
Gadgets-level operation
We summarize the operations involved in several types of gadgets. As an example, we enumerate the different semantic gadgets of ProFTPD [56] (which is found by Hu et al. [25]), and analyze the operations involved in these gadgets and the corresponding data objects respectively. To illustrate this more clearly, we show in the Fig. 2 that the call graph of the functions in which each gadgets is located.

The call graph of functions where gadgets are located in ProFTPD.

Arithmetic gadget in ProFTPD

Assigment gadget in ProFTPD

Load gadget in ProFTPD

Store gadget in ProFTPD

Jump gadget and dispatcher in ProFTPD
Besides, Hu et al. propose a DOP tool [17] to find potential gadgets. In particular, the DOP tool firstly identifies store instructions and extracts the operands of those instructions. It then employs a backward data-flow analysis to identify the definitions of the operands used in the store instructions. The generated data-flow contains instructions that derive the operands, such as loaded from memory or calculated from registers. If a load operation is found in the data-flow analysis, it is considered a potential gadget. The gadgets we show in Code 1–Code 5 are just a part of these potential gadgets. In our experiments (seciton 6.1.1), we use this tool to find potential gadgets for 20 real-world applications including ProFTPD. We show that the data objects manipulated by these potential gadgets are a small percentage of the program, about 3% on average.
In order to achieve arbitrary malicious behaviour, attackers need to identify a dispatcher, a code fragment which is capable of stitching gadgets together. In fact, DOP attacks [25] firstly finds all potential gadgets in the application, and then identifies a dispatcher that can stitch certain gadgets to achieve a specific behavior. Therefore, not all potential gadgets can be utilized by dispatchers, only the gadgets involved in dispatchers can be exploited by DOP attacks. As shown in the Fig. 3, the dispatcher is a sequence of code containing a loop and a selector. The loop can execute the same code segment n times (round-1, round-2,…, round-n). The selector executes different gadgets in each round, depending on the attacker’s input. Unlike attacks such as Return-Oriented Programming (ROP) [60], DOP does not execute gadgets sequentially, but rather executes different gadgets in different rounds.

The dispatcher is a sequence of code containing a loop and a selector.
Therefore, to determine whether a code segment is a dispatcher, the following conditions need to be satisfied:
Condition 1: dispatcher contains a loop, with non-control data which attackers can manipulate to control the loop conditions.
Condition 2: dispatcher contains a selector that jumps to available gadgets, with non-control data which attackers can manipulate to select different gadgets.
Condition 3: the loop/selector is supposed to reach the target data or security-critical data which can be manipulated by the attackers.
Condition 4: there is a vulnerability in the dispatcher that can be exploited by attackers to perform legitimate writes. In this paper, we only consider known vulnerabilities in the program, such as Common Vulnerabilities and Exposures (CVEs) [13].
For example, Code 5 shows an example of a dispatcher in ProFTPD with a
Step 1 (By automated tool): First, we use wllvm [70] to compile the source code of the application and obtain its LLVM IR code. Next, we generate potential gadgets and create a list of functions where these potential gadgets are located by using Hu et al.’s DOP tool [17]. By utilizing SVF [66], a static value-flow analysis tool for LLVM-based languages, we obtain the complete call graph of the application. Through a Depth-First Search (DFS) on the obtained call graph, we traverse the call chains of the functions where potential gadgets are located. If a call chain of a certain function includes all the functions where potential gadgets are located, we consider that function as a possible dispatcher. For example, as shown in columns 1–3 of Table 2, we automatically obtain the functions in ProFTPD where all potential gadgets are located and their call relationships, so that we can get the possible dispatchers. It is worth noting that we may still unable to get a complete call chain because some functions in the source code are optimized by the compiler to become inline functions (e.g., function
Step 2 (Manually): We complete the call chain of the functions by analyzing the source code and determine whether the possible dispatchers satisfy condition 1–condition 4. For example, as shown in columns 4–8 of Table 2, We manually validate whether the possible dispatchers satisfy condition 1 to condition 4.
As shown in Table 2, by executing step 1 and 2, we find dispatcher (function
The process of finding dispatcher in ProFTPD by our semi-automated tool
Assumption settings
Before we introduce DSLR–, we assume that the system is protected with two widely used defenses: ASLR and CFI, by default.

A code snippet demonstrates that attackers use the relative distance between data objects to get the address of the target data
The memory addresses of some security-critical data in file
As described in Section 2.2, DSLR can hide the location information of target data from attackers, i.e., the randomized structures make it difficult for attackers to reverse-engineer the exact layout of the data in memory. Hu et al. [25] also mentioned in their research that data-plane randomization can prevent DOP attacks. However, the DSLR strategy proposed by Lin et al. [41] randomizes almost all structures except for structures that are initialized during runtime. This strategy of blindly randomizing a wide range of data is not an optimal strategy in the context of DOP attacks. Therefore, this paper proposes DSLR–, which focuses on randomizing only the target data in DOP attacks.
In practice, we can provide two options for randomizing the scope of target data: randomizing the target data within potential gadgets and randomizing the target data within dispatchers. Although it is evident that the number of target data within dispatchers is smaller than the number of target data within potential gadgets, and randomizing the target data within dispatchers can effectively mitigate Turing-complete malicious behavior by DOP attackers. However, attackers still have the possibility to manipulate target data that is not within dispatchers but within potential gadgets, leading to DDM attacks. Therefore, for more comprehensive protection, we choose to randomize the target data within potential gadgets.
To achieve DSLR–, we modify the randomization tool developed by Lin et al. [41]. Lin et al.’s work operates on GCC’s Abstract Syntax Tree (AST) representation, which provides a tree-based representation of the code structure [72]. Their approach supports inserting garbage fields and reordering variables within the AST. We retain these methods from [41] and also perform data randomization at the AST representation level. Randomizing at the AST level offers several advantages: 1) The AST contains rich source code information, enabling comprehensive analysis and modification. 2) The AST is easy to understand and manipulate, facilitating the randomization process. 3) Since GCC does not allocate memory space to data objects when generating the AST, the modifications made during randomization do not need to involve specific memory addresses. Furthermore, we use the keyword
However, our approach differs from Lin et al. [41] in terms of the scope of data randomization. While Lin et al. randomize a wide range of data structures, excluding those that are initialized during runtime, we specifically focus on struct-type data manipulated by potential gadgets. By narrowing the scope, we effectively reduce the performance overhead and memory overhead of the randomization task. We show it in Section 6.2.3.
Evaluation
In this section, to prove our point, we calculate the percentage of target data in potential gadgets and dispatcher. Besides, we also evaluate DSLR– from the following aspects: percentage of randomized variables, security effectiveness, performance overhead, and binary size & memory overhead.
Experiments environment. We conduct our experiments on a machine powered by an Intel(R) Xeon(R) Gold 5118 CPU @ 2.30 GHz with 16 cores. The experiments are performed using the Ubuntu 20.04.5 LTS operating system.
Percentage of target data calculation
Target data in potential gadgets
We automate the process of obtaining potential gadgets using the Hu et al.’s DOP tool [17] on the IR codes of 20 applications. These applications consist of two categories: 1) 4 applications analyzed in the Hu et al.’s research [25] (nginx, ProFTPD, sudo and WU-FTPD), and 2) 16 applications which we compile successfully in FuzzBenchmark [21], as shown in Table 5. We compile these programs by using wllvm [70] so that we can get the LLVM IR code of a whole program.
Moreover, we calculate the percentage of target data operated by potential gadgets. This percentage is calculated by dividing the number of target data by the number of all type variables in the program, including global variables, local variables and heap variables. Specifically, we automate the process of static analysis to count the number of these three types of variables from LLVM IR code. Since heap variables are dynamically allocated, it is not possible to determine the exact number of heap variables at runtime through static analysis. Instead, we can only estimate the runtime quantity by statically counting the occurrences of allocation functions such as
We take ProFTPD [56] as an example and show it’s gadgets and target data found by DOP tool in Table 4. For example, IR code file
Some examples of gadgets found by DOP tool [17] in ProFTPD
Some examples of gadgets found by DOP tool [17] in ProFTPD
The percentage of target data manipulated by potential gadgets in 20 applications
We use our semi-automated tool (described in Section 4.2.2) to analyze dispatchers for 16 FuzzBenchmark applications. To our knowledge, there have been no studies attempting to find the dispatchers for these 16 FuzzBenchmark applications. Therefore, the locations of dispatchers for these applications are not known beforehand. We obtain exploitable dispatchers in OpenSSL, libpcap and libpng, which are introduced as follow.

Dispatcher in openssl

Dispatcher in libpcap

Dispatcher in libpng
Furthermore, we calculate the percentage of target data in these three applications’ dispatchers with the Equation (3). As shown in Table 6, we can see that the percentage of target data in dispatcher is very small, which is 0.77% on average.
The percentage of target data in dispatchers for Openssl, libpcap, libpng
As described in Section 6.1.2, we obtain exploitable dispatchers in OpenSSL, libpcap, and libpng using our semi-automated tool. Additionally, Hu et al. [25] identify exploitable dispatchers in ProFTPD. We consider that the presence of these exploitable dispatchers in the four applications makes them highly susceptible to DOP attacks. Therefore, we apply DSLR– to protect the target data of potential gadgets in these four applications. In addition, for the purpose of comparison, we also use the randomization tool developed by Lin et al. [41] to protect these applications. In this section, we evaluate the percentage of randomized variables, security effect, performance overhead and binary size & memory overhead of DSLR–, as well as compare these aspects with tool [41].
Percentage of randomized variables
We calculate the percentage of randomized variables in Equation (4). As shown in Equation (4),

Percentage of randomized variables for ProFTPD, Openssl, libpcap, libpng when using DSLR– and tool [41].
As shown in Fig. 4, we conduct a statistical analysis on the percentage of randomized variables in the four applications when using DSLR– and the tool developed by Lin et al. [41]. The results clearly demonstrate that DSLR– achieves a remarkable reduction in the number of randomized variables. Specifically, DSLR– successfully reduces the required randomized variables by an impressive 85.8% compare to [41].
In this section, we analyze DSLR–’s security effect from: 1) randomization of relative distances between variables, and 2) the effectiveness against DOP attacks (including privilege elevation and DOS attack).
The relative distance of variables in struct res are randomized by DSLR–
The relative distance of variables in struct
The variables that attackers aim to manipulate in order to achieve privilege elevation and DOS attacks, as well as the effectiveness of DSLR– in mitigating these two types of attacks
In this section, we test our randomization tool with SPECCPU 2006 benchmark [64]. Since our randomization tool makes modifications to GCC, we compile each benchmark program 3 times during our testing, denoted by
Evaluation of the performance overhead of DSLR–
Evaluation of the performance overhead of DSLR–
In addition, as there are variations in performance overhead across different machines, we reproduce the randomization tool developed by Lin et al. [41] in our own environment. To evaluate the trade-off between performance overhead and the proportion of randomized structures, we configure the randomization ratios of the tool [41] to be 30%, 50%, 70%, and 100%. The results of testing DSLR– and the tool [41] are presented in Fig. 5. It is evident that as the randomization ratio increases, the performance overhead of the tool [41] also increases. In comparison to the tool [41] with a randomization ratio of 100%, DSLR– achieves a remarkable 71.2% improvement in performance.

Comparison of the performance overhead of DSLR– and tool [41].

Binary size overhead and memory overhead of DSLR– and tool [41].
Due to the randomization of variables’ space layout, we also need to measure the binary size overhead and memory overhead introduced by the randomization tool. As shown in Fig. 6a and Fig. 6b, we conduct tests on DSLR– and tool [41] to evaluate their impact on binary size overhead and memory overhead in the context of four applications. The results demonstrate that DSLR– incurs minimal binary size overhead and memory overhead, averaging at 0.77% and 0.70% respectively. In comparison to the tool [41], DSLR– reduces the binary size overhead and memory overhead by approximately 76.3% and 82.5% respectively.
DOP attacks
Our work builds on the study of DOP attacks proposed by Hu et al. [25]. They also introduce in an earlier research [23] about how to construct data-oriented exploits using deterministic addresses of data objects and relative distances between data objects. Their work confirm the feasibility of DOP, but it is undeniable that how to select exploitable data to form a complete DOP attack chain is very difficult. In this paper, we propose a semi-automated method for locating DOP target data, and after our statistics, the percentage of target data is small, roughly 3% on average.
Existing defenses
To the best of our knowledge, existing defenses do not focus on the protection of specific data exploited by DOP. Here, we discuss the existing defense strategies and their limitations when applying them to DOP attacking scenarios. Data object encryption [3,7,57] generates equivalence classes of data objects based on a point-to-graph. And if both data objects are accessed through the same pointer, they are considered to be the same equivalence class. However, this type of data encryption is for all pointer type data and causes a non-negligible performance overhead, e.g., runtime data object encryption has 42.12% overhead on average [57]. DSLR [8,41] randomizes all struct-type data in the program and runtime DSLR [8] has significant performance overhead in program gzip, gap and twolf (approximately 100%—120%). Besides, our statistics show that the percentage of struct-type target data for DOP is only about 2.6% for real-world applications. Therefore, randomization of all struct-type of data in existing DSLR is not suitable for the DOP attacking scenarios. Pointer intergity research [34] only gives a protection for control-data (e.g., function pointers, return addresses), which prevents control-flow hijack attacks rather than data-oriented attacks. And PAC [40] leverages cryptographic MACs to protect the pointer not arbitrarily manipulated by an attacker. However, this methods is not suitable for large-scale applications, i.e., execution time and memory consumption increase as the number of protected pointers increases. Pointer boundary checking techniques [45] check the address range of each pointer to see if it is within the specified range during the runtime of program. Theoretically, complete pointer detection prevents all possible DOP exploits. However, the performance and memory overhead of complete pointer protection is also significant, for example, SoftBound [46] produces an average performance overhead of 67% and a memory overhead of 200% in standard benchmarks. DFI [20,43] technology generates a data-flow graph (DFG), which is the define-use relationship of each variables. So that the DFI instruments ensure that each variable is only written by legitimate writes at runtime before each read instruction. Complete DFI is DOP resistant, but it also has very heavy performance overhead, which has 103% for interproc DFI.
Strategies for selectively protecting data objects
To mitigate the high performance overhead, researchers have also proposed strategies for selectively protecting data objects. In order to effectively protect data against memory disclosure attacks, CRYPTOMPK is introduced by Jin et al. [29], which utilizes crypto-aware static taint analysis to automatically track and label sensitive data. Additionally, it leverages the Memory Protection Keys (MPK) hardware feature of Intel processors to isolate these sensitive data. To counter control flow hijacking attacks, DynPTA [52] combines static analysis with dynamic data flow tracking (DFT) to selectively encrypt a subset of annotated sensitive data in memory. To the best of our knowledge, there is currently no existing selective data protection method specifically designed to defend against DOP attacks. This paper addresses the issue of high overhead in existing defenses during DOP mitigation by only randomizing the memory layout of target data exploited by DOP. Experimental results demonstrate effective resistance against DOP attacks with an average performance overhead of only 2.3% and memory overhead of 0.70%.
Discussion
We declare that our research in this paper is specifically focused on DOP attacks. In particular, in Section 4, we propose narrowing down the scope of the data objects that need protection, specifically the target data involved in DOP attacks. In Section 5.2, we introduce DSLR– as a tool for protecting these target data. While the target data may include security-critical data of interest in DDM attacks, we do not guarantee that the protection provided for DOP target data can necessarily withstand DDM attacks.
Furthermore, although Block Oriented Programming (BOP) [28] is also a data-oriented technique that does not violate CFI, we believe that BOP and DOP attacks differ significantly in several aspects:
Payload: In DOP attacks, the payload consists of a sequence of instructions capable of modifying specific memory values. On the other hand, BOP payloads are predefined high-level language SPloit Language (SPL) instructions, with support for 13 different payloads [28].
Implementation approach: In DOP, specific operations are achieved through gadgets, which are individual instructions. These gadgets are linked together using a loop and a selector in the dispatcher, and specific gadgets are executed in each iteration. In contrast, BOP achieves malicious behavior by executing a sequence of BOP gadgets where each BOP gadget is: a functional basic block which executes an SPL statement, and zero or more dispatcher blocks which chain two functional basic blocks. A loop or selector is not necessarily required in the BOP’s dispatcher.
Gadget identification methods: DOP involves compiling the source code into LLVM IR code and utilizing LLVM Pass programs [42], which can perform program transformations and analyses during the compilation process, to search for potential gadgets, as described in Section 6.1.1. In contrast, BOP utilizes symbolic execution [33] to perform static analysis on the source code and identify basic blocks that can achieve the functionality of SPL statements.
In our design, DSLR– is able to randomize struct-type target data of DOP attacks, as discussed in Section 5.2. However, the automated tool for implementing BOP attacks, BOPC [5], generates a set of “what-where” memory writes that indicate how the memory should be initialized (i.e., which values should be written at which memory addresses). When these memory writes pertain to structures, DSLR– may be capable of randomizing them, offering probabilistic defense against BOP attacks. Nevertheless, for non-structure types such as arrays and integers, DSLR– is unable to provide defense.
Conclusion and future work
In this paper, we focus on the target data manipulated by the DOP. We analyze how gadgets and dispatchers manipulate these data and the necessary conditions to construct a complete DOP attack chain. We calculate the percentage of target data in the program and show that the data objects manipulated by the DOP are a small percentage of the program. Based on these findings, we propose DSLR– to randomize the memory address of the target data. This randomization covers the absolute addresses of the data objects as well as the relative distances between them. However, our work can still be improved in the following aspects:
We use a semi-automated tool to find the dispatcher and its target data in the program, i.e., the process is still dependent on manual identification. We hope to create an automated tool to do this in the future.
Since our randomization tool is based on the DSLR tool developed by Lin et al. [41], it is not compatible with higher versions of the GCC compiler due to its age. Due to the availability of randomize structure layout support in modern versions of GCC/LLVM [58,59], we intend to incorporate compatibility with these modern versions of GCC/LLVM, and enhance the functionality of our tool in future work. Moreover, we can only randomize struct-type variables in program. We plan to randomize all-type target data in the future.
Footnotes
Acknowledgments
This work was supported by the National Key Research and Development Program for Young Scientists of China (No. 2022YFB3102800). I would like to express my sincere gratitude to Professor Peng Liu and Zhilong Wang for their invaluable guidance, unwavering support, and profound expertise throughout this research. Their mentorship and insightful feedback were instrumental in shaping this work. I also would like to extend my heartfelt appreciation to the reviewers for their constructive feedback and valuable insights. Their thorough examination and thoughtful comments greatly contributed to enhancing the quality and clarity of this paper.
