Abstract
This paper presents the design and implementation of a systematic Inter-Component Communications (ICCs) dynamic Analysis Technique (SIAT) for detecting privacy-sensitive data leak threats. SIAT’s specific approach involves the identification of malicious ICC patterns by actively tracing both data flows and implicit control flows within ICC processes during runtime. This is achieved by utilizing the taint tagging methodology, a technique utilized by TaintDroid. As a result, it can discover the malicious intent usage pattern and further resolve the coincidental malicious ICCs and bypass cases without incurring performance degradation. SIAT comprises two key modules: Monitor and Analyzer. The Monitor makes the first attempt to revise the taint tag approach named TaintDroid by developing the built-in intent service primitives to help Android capture the intent-related taint propagation at multi-level for malicious ICC detection. Specifically, we enable the Monitor to perform systemwide tracking of intent with five abstraction functionalities embedded in the interactive workflow of components. By analyzing the taint logs offered by the Monitor, the Analyzer can build the accurate and integrated ICC patterns adopted to identify the specific leak threat patterns with the identification algorithms and predefined rules. Meanwhile, we employ the patterns’ deflation technique to improve the efficiency of the Analyzer. We implement the SIAT with Android Open Source Project and evaluate its performance through extensive experiments on a particular dataset consisting of well-known datasets and real-world apps. The experimental results show that, compared to state-of-the-art approaches, the SIAT can achieve about 25% ∼200% accuracy improvements with 1.0 precision and 0.98 recall at negligible runtime overhead. Apart from that, the SIAT can identify two undisclosed cases of bypassing that prior technologies cannot detect and quite a few malicious ICC threats in real-world apps with lots of downloads on the Google Play market.
Keywords
Introduction
In recent years, we have witnessed explosive growth in the number of mobile devices, and the large quantity of diversified mobile applications (apps) on those mobile devices have made our daily lives much more convenient and enjoyable. However, with the rapid growth of mobile apps, they have increasingly become the target of mobile malware authors, who generally develop and distribute mobile malware apps that aim at stealing and disclosing various types of sensitive and valuable information that is associated with either mobile user or device. As a result, malware has become one of the most significant security threats to mobile operating systems, especially Android.
In the Android system, the widely used Inter-Component Communication (ICC) [15] plays an essential role between the components of apps that are isolated in different sandboxes. Apps pass messages between each other by passing the intents, which are passive data structures holding the abstract descriptions of operations to be performed between components. Such a flexible method contributes a lot to functionality reuse and data sharing; however, it also exposes a vulnerable surface to several security threats [26]. In the context of ICC mechanism scenarios, apps whose developers overlooked security issues often suffer from risky vulnerabilities such as intent hijacking and spoofing [7], resulting in sensitive user data leak or privilege misuse by other apps, particularly mobile malware. Besides, two or more malicious apps with ICC paths could even collude on stealthy behaviors that neither of them could accomplish alone [5,12]. In these cases, malicious apps send and receive intents in a way that looks as if those are ordinary message exchanges. By this means, they can often easily bypass those classical malware detection approaches, which regularly inspect apps individually.
It is challenging to distinguish a normal or malicious ICC in a given security context. Many existing ICC-relative research works [27,31] focus on detecting vulnerabilities in benign apps. Benign apps do not have the malicious intent of ICC but may have some inherent design flaws. Recently, most of the research works that aim at identifying ICC paths with malicious purposes are in two categories: static analysis and running protection. A static analysis approach often extracts sensitive ICC paths by matching attributes and tracking data flow (e.g., IC3 [30], AmanDroid [39], DIALDroid [6]). However, even the state-of-the-art static analysis-based approaches suffer from many false positives because they cannot validate the specific data content through static analysis when facing the reflection and unreachable code. As a result, ignoring the validation of the data content in the static analysis will lead to an ICC path that does not occur in reality. Alternatively, runtime protection-based approaches (e.g., XManDroid [7] and SCLib [40]) either enforce mandatory access control according to the predefined policy set or ask about the End-user’s decision for access permission to protect them from threats when apps communicate with each other using the ICC mechanism. However, those runtime protection-based approaches only pay attention to the information acquired before receiving intent, ignoring the actual behaviors of the receiver. Furthermore, they only determine whether to prohibit the mobile app from receiving intent according to various information, which makes these runtime protection techniques unable to identify malicious ICC paths for data leaks accurately.
Hereby, along with the explicit data flow, we pay more attention to identifying runtime
We have concluded three kinds of typical data leak-related attacks and divided them into five malicious ICC patterns according to their behaviors.
We have proposed SIAT,1
We have evaluated the performance of the SIAT through extensive experiments on both malware apps and benign apps composed of several well-known datasets and thousands of real-world Android apps.
The remainder of this paper is organized as follows. Related works are discussed in Section 2. Section 3.2 discusses the threat patterns SIAT focuses on. Section 4 provides our motivations and systematic methodology. Section 5 describes the overall architecture of SIAT. Section 6 presents the comprehensive performance evaluation that we have conducted for SIAT. Before drawing conclusion in Section 8, we introduce the in-depth discussion in Section 7.
App data leak threat issues have attracted a wealth of taint-based research efforts. [5] provides the first survey on inter-app communication threats, app collusion, and state-of-the-art detection tools in Android, providing a comprehensive assessment of the strengths and shortcomings of state-of-the-art approaches. The state-of-the-art approaches could broadly be divided into two categories: single-app analysis and app-pair analysis. And the analysis approaches of each category also are in two types: static analysis [42] and dynamic (a.k.a., runtime) analysis.
Single-app analysis. There are a few static single-app analysis approaches. CHEX [27] can identify the component hijacking vulnerabilities through static data flow analysis. Amandroid [39] focuses on analyzing inter-component data flows and tracking the interaction of the components. IccTA [25] addresses the major challenge of performing data flow analysis across multiple Android components for privacy leakage detection based on static taint analysis. The subsequent RAICC [35] reveals atypical ICC links in applications. It reflects the fact that their role is not primarily to start a component (as most ICC methods typically do in this paper) but rather to perform some action (e.g., set the alarm or send an SMS by starting a component with objects of type
The dynamic single-app analysis approaches [9,14,20,22,41] monitor the app at runtime. As a data flow tracing method, TaintDroid [14] monitors the system at runtime and tracks the taint transmission to detect privacy leakage. IntentFuzzer [41] identifies the vulnerable interfaces by dynamically sending test intents to the components. IntentDroid [20] tests eight different vulnerabilities caused by unsafe handling of coming ICC intent data. DazeDroid [22] fully-automated extracts the components and fuzz all interfaces in apps.
App-pair analysis. In the static app-pair analysis technologies, a precise and scalable end-to-end flow static analysis approach is introduced in [13] to identify the malware collusion risk in Android via fine-grained security risk classification policies. ApkCombiner [24] directly combines two apps into a single app and uses the single static data flow analysis method to identify sensitive ICC methods. COVERT [3] employs a compositional analysis method for finding inter-app vulnerabilities. JITANA [38] can analyze multiple android apps simultaneously. DIALDroid [6] analyzes each app and adopts the database to calculate the sensitive ICC path. PIAnalyzer [19] is a static approach for modeling specific vulnerabilities where other apps can intercept broadcasted PendingIntents. Although PRIMO [29] predicts the likelihoods of inter-app ICC occurrences via a formalism for ICC links based on set constraints, it cannot tackle links created by native code or Java reflection, and it is not designed for collusion detection.
Most dynamic app-pair analysis technologies enforce security policies only at the sender to protect users from inter-app threats. XManDroid [7] is the first approach proposed to prevent application-level privilege escalation through enforcing permission policies. FlaskDroid [8] provides a mandatory access control strategy simultaneously for both Android’s middleware and kernel layers to prevent privilege escalation and collusive data leaks. SCLib [40] proposes an approach that performs inter-app mandatory access control for defending against component hijacking without modifying the Android system. SEALANT [23] combines static analysis and enforcing security policy to provide end-user protection.
Moreover, various works have explored detecting privacy leaks at the network level. For instance, ReCon [33] uses a machine learning classifier to identify leaks and can deal with simple obfuscation. AGRIGENTO [10] is resilient to obfuscation techniques, such as encoding, formatting, and encryption, by performing differential black-box analysis on Android apps. Existing technologies still suffer from some covert channels. Regarding ICCs, SIAT can handle the data obfuscation, encryption, or transmission via Secure Sockets Layer (SSL) based on our implicit control flow analysis.
In particular, there are some relevant works to SIAT. SpanDex [11] is integrated with TaintDroid and Android’s Dalvik VM. Instead of ICCs, it is focused on securely tracking the password data flows within an app by monitoring explicit and implicit flows differently. Staicu [37] provides an empirical study of dynamic information flows for JavaScript at the language level and concludes that implicit flow tracking is needed for some privacy scenarios observable. In contrast, SIAT focuses on the control flow at the component level. Unlike existing detection technologies, SIAT not only inspects the sender’s but the receiver’s intent-related behaviors by migrating the runtime approach TaintDroid to the systemwide tracing of intent data across multiple apps/components at runtime to figure out the real intent usage pattern of the related components. In this way, SIAT can significantly improve threat detection accuracy at the cost of negligible runtime overhead.
Background and Data Leak Threats
Intent background
Communications between components of mobile applications (a.k.a., apps) are achieved via sending and receiving intents, which are data structures holding an abstract description of operations to be performed and are generally used with methods to invoke activity, service, and broadcast receivers. Intentions can be divided into two types. One is the explicit intent, which specifies the target component by name, and the other is the implicit intent, which does not name the target (the component name field is blank) and is usually used to activate components in other applications. An intent filter is a key to defining the behavior of intents, which works as an expression in an app’s manifest file that specifies the type of intents the component would like to receive. Components that wish to receive implicit intents have to declare intent filters. Although an intent filter offers a useful level of flexibility in the run-time binding of components, it is frequently overused or used inappropriately, with negative consequences for security.
Data Leak Threat Patterns
(Malicious ICC for Data Leak).
Given a security context, a malicious ICC for data leak refers to a real ICC path among multiple components that leaks out sensitive data to an unauthorized party via transferring explicit/implicit intents.
As Section 6 details, we pay more attention to the malicious ICC paths incurred by malware through implicit control flows. These malicious ICCs mainly appear in three typical threat patterns, i.e., intent hijacking, intent spoofing, and intent collusion. As shown in Fig. 1, the following threat patterns that SIAT intends to identify in Section 6 lead to different malicious ICC behaviors that can steal or leak sensitive data.

The intent hijacking, spoofing and collusion patterns that bring forth ICC paths for data leaks will be identified by SIAT.
Intent hijacking. Intent hijacking involves a malicious app receiving an intent not intended for it. As depicted in Fig. 1, in intent hijacking, the implicit intent may never reach the expected component, but an unauthorized app intercepts it. In Victim app, when the Component A sends intent to Component B, the Malware1 app can obtain the intent just by setting the attributes matched with the intent in the intent filter of the Component C. As a result, it is easy to cause data leakage during the intent hijacking. If the data (e.g., location, contacts) requires permission, and the intent does not restrict the receiver, Malware1 obtains the sensitive data without the necessary permission. In this case, the Malware1 escalates the privilege by hijacking the intent besides stealing the sensitive data.
Intent spoofing. Intent spoofing is an attack where a malicious app induces undesired behavior by forging an intent. Figure 1 illustrates how intent spoofing works. If the textitVictim app discloses that the Component B expects to receive intent from the Component A or some other components. Once textitComponent B does not have appropriate restrictions on the attributes of the intent filter, then textitMalware1 may pretend to be textitComponent A and send intent to textitComponent B. In this case, it could trigger the corresponding action of textitComponent B to leak data.
Intent collusion. Intent collusion generally refers to the situation in which two apps cooperatively accomplish implicit malicious behaviors that a single app cannot achieve solely. As Fig. 1 shows, Component D in Malware1 sends an intent with location data to Malware2 via implicit intent (e.g.,

The systematic methodology for implicit control flows identification. All possible methods the data could leave the device is a sink.
Motivations. The actual risk of an ICC path intrinsically depends on the specific security context/semantics. It is challenging to tell the normal or malicious app by merely inspecting its ICC-related behaviors. For instance, if the Component C happens to have the attributes matching with the intent from the Component B, there might be an unintentional false positive hijacking case regarding conventional static analysis technologies. Ignoring that might result in high false positive rates. In contrast, a malicious ICC might seem normal. For example, in Fig. 1, if the receiver Component E cannot send out the location data via SMS message without required privileges, even though the Component D intends to, the ICC between them seems normal. Concerning the probabilistic matching between the implicit intent and intent filters (e.g., due to the mismatch of multiple intent filters and data types), existing methods are prone to false positives or false negatives in identifying the malicious ICCs in the threat patterns above.
Methodology. We need to differentiate the defined malicious ICC from normal ICC to solve the above problems. Therefore, we are dedicated to discovering the inherent logic behind implicit control flows via a systematic methodology. Through the ICC process, we perform a comprehensive taint analysis at the multi-level(i.e., app message-level, variable-level, method-level, and file-level) of both the sender and receiver sides. Our methodology almost has no false positives in runtime analysis, achieving significant recall owing to its systematic perspective. We showcase the solution of the ‘coincidental malicious ICC’ in Section 6.1.2.
Let
We can discover the malicious ICC systematically based on these rules: (1) the threat patterns identified by Algorithm 1; (2) the sink leaking the sensitive data; (3) implicit/complex ICCs usage pattern which tending to circumvent our detection, consistent with the insight found in our manual verification in Section 6.2.
The SIAT
Technical challenges
Technical challenges for architecture design. A key challenge to overcome in architecture design is to design a sound architecture that can identify intent-related data and control flows without degrading detection accuracy and runtime performance.

The architecture of SIAT. Each logical function tracing the intent via the intent primitives denotes a set of taint or intent handling methods.
Technical challenges for the Monitor. The critical challenge for the Monitor is finding a sound way to migrate TaintDroid to cooperate with Android for dynamic ICC path identification at the explicit data flows and the implicit control flows in a systematic perspective. The fundamental limitations of TaintDroid lie in two aspects, i.e., (1) it is a single-app analysis approach; (2) it can be circumvented through data leaks via implicit control flows. For example, in Fig. 7, the intent transferred via
Technical challenges for the Analyzer. The critical challenge for Analyzer is building a complete and accurate ICC pattern with the taint logs.
SIAT works as a runtime safety guard to identify the malicious ICCs leaking data by analyzing the real-time data and control flows. The use scenes for real-world app guard are illustrated in Section 6.2. The primary objective of SIAT is performing systematic taint tracking by properly revising the runtime approach TaintDroid.
As shown in Fig. 3, to be practical, the primary design strategy of SIAT is to spread the complex detection workload to two different modules. Monitor and Analyzer are responsible for the runtime data collection and taint logs analysis in the background, respectively. Furthermore, combining the improved TaintDroid with Android via the well-defined intent service primitives, the SIAT provides a real-time systematic tracing of privacy-sensitive data and visibility into how collaborative malicious behaviors occur via intent for data leaks. Meanwhile, we avoid revising their core logical structures when building explicit data flows and implicit control flows tracking on top of Android with TaintDroid to prevent functionality and performance degradation, as the evaluation results show in Section 6.
Monitor
Monitor is responsible for tracing and analyzing the flow of privacy-sensitive data at runtime by inspecting both the sender’s and the receiver’s intent. Figure 4 demonstrates its implementation of the five functionality steps in ICC workflow. The single and multiple classes denote the number of classes involved in implementing functions.

The overall workflow (data flow and control flow) of Monitor in inter-component communication.
Firstly, we need to develop the built-in intent service primitives in the main files of TaintDroid, enabling it to interact with Android at multi-level taint propagation (i.e., app message-level, variable-level, method-level, and file-level) without taint detection precision loss.
Secondly, we would like to leverage architectural features of components based inter-app communications, to enable the systematic tracking of implicit control flow with TaintDroid. We have deliberated on the extension of Android framework layer via code instrumentation engineering in the interactive workflows between TaintDroid and Android, covering the lifetime of four main Android components (i.e., Activity, Service, Content Provider, and Broadcast Receiver). Specifically, to obtain the optimal cost-benefits tradeoff, we have to perform an in-depth study on the runtime data collection workflow and thus abstracted five key functionality steps by extending the Android framework layer as shown in Fig. 3, which can interact with the intent service primitives to accurately carry out data and control flows collection.
The five logical functionality steps of the Monitor for overcoming the migration challenge are listed in Section 5.3.1. It is worth noting that each abstraction function in the five logical functionality steps above is not a concrete java function name. Instead, they denote a set of taint or intent handling methods for each step in our extension of the framework layer of the current Android operating system based on the Android Open Source Project. In this way, by using the intent service primitives to interact with TaintDroid, the Monitor inspects the relevant components on an ICC path and the data and control flows associated with the intent at runtime and further identifies the intent’s sender, the intent-matched component, and the receiver at several critical points of the ICC process, respectively.
Monitor implementation
Monitor aims to track the thorough intent data/control workflows to identify the real senders and the receivers and their subsequent behaviors. The implementation places emphasis on the tracking of implicit control flows. We need to migrate TaintDroid to trace sensitive intent data.
Firstly, as highlighted in Fig. 3, we have defined a set of service primitives for intent communications. They encapsulate the sensitive data operation functions and work as middleware between the core methods of TaintDroid and the Android intent mechanism at the method-level and file-level framework layer. The new intent primitives encapsulate the functions of returning the source of current taint, obtaining the next tag with the original taint, setting/getting the tags, and so on. In this way, the main functions at the framework layer shown in Fig. 3 can cooperate with TaintDroid efficiently. For instance, when apps call APIs to get those privacy-sensitive data, based on the function AddTaintToData(data) in Fig. 4, we taint the data as the T-data by which we can trace and distinguish the data from others. Also, we can extract the tag from T-data and identify the T-data by comparing the number with the function IdentifyTaintData(T-data). The bit vector of the tag is null if the data is not tainted.
Meanwhile, to catch the intended sensitive data accurately, by revising the main files of TaintDroid, our Monitor defines a group of new sensitive data and eighty taint tags for identifying them in intent communications. For example, the sensitive location data,
Then, we leverage components-based inter-app communications by abstracting five key functionality steps embedded in the Android framework layer to enable the systematic tracking of implicit control flow with TaintDroid. To extract the intent usage pattern, we cover all critical methods involved in the lifetime of intent, i.e., the methods to start, send, find, and receive intent involved in main Android components, such as
Specifically, we implement the five functionality steps to track the intent through its whole lifetime as follows:
Setting Taint. When the sender gets sensitive data from sensitive sources, using the function AddTaintToData, the Monitor taints the sensitive data and adds a variable tag (an 8-digit hexadecimal taint number) to it, which clearly labels its source. We name the sensitive data tainted as T-data. By examining thousands of apps, we identified thirty-eight types of sensitive data, such as location, phone number, history, network, SMS message, accelerometer, data from
SharedPreferences is a persistent storage method provided by Android.
Checking Intent. When the sender sets the intent attributes (e.g. extra, action), the Monitor checks the T-data to see whether or not it is tainted or retained through the function IdentifyTaintData(T-data).
Sending Intent. If there is a sender that calls the system API, e.g.,
Receiving Intent. Upon obtaining the best matched component, the Monitor can find out all candidates and the real receiver via FindAllCandidate and FindReceiver().
Checking Taint. When the receiver extracts the T-data from an intent, as long as the sensitive APIs are called, the Monitor will check if any parameter in the APIs is T-data to identify the source of the data with IdentifyTaintData. Note that we exploit the multiple classes icon in Fig. 3 to denote that this functionality needs to perform more complex inspection operations at multiple key points than step (2).
In addition, To inspect the sensitive data in interested APIs, we take advantage of a mature machine-learning technique named ‘SuSi’ in [32] for achieving the most likely source and sink methods. The data is tagged as tainted if it comes from a privacy-sensitive source. If the tainted data is found in the sink, the privacy-sensitive data may inevitably be leaked. We describe the implementation of data and control flows tracking in Monitor by answering the four questions below:
Analyzer exploits the taint logs outputted by the Monitor to build the specific threat patterns reports for the users.
Key technique: Pattern building
As Table 1 depicts, a threat patterns to be built by Analyzer is composed of three objects, including the Sender, the Intent, and the Receiver. To ensure efficiency, we only adopt the most useful attributes, e.g., the taint data, which denotes the new sensitive data for intent. Based on Table 1, there are two key technologies below for building patterns:
The most useful attributes of application and intent adopted in our ICC patterns analysis
The most useful attributes of application and intent adopted in our ICC patterns analysis
The source methods indicate where the sensitive data most likely comes from based on the ‘SuSi’ [32].
Intent data extraction. To build accurate threat patterns, the Analyzer needs to extract the intent-related information from APK package and logs. Firstly, the Analyzer needs to extract the related nodes, child nodes, and their attributes by iterative traversal of the DOM tree in the

The parts of the logs
Analysis of attributes related to permissions in patterns. The Analyzer needs extra work to analyze the attributes related to the permissions in the patterns by identifying the attribute permissions required in the sender based on the permissions required to generate the tainted data in the intent. The attribute permissions required in the receiver is adopted to implement the sink method. Hence, for the sender, the attribute permissions lacked denotes the one that the sender does not have, but the receiver requires, and vice versa for the receiver. Firstly, there are redundant patterns generated by the multi-hop intent transfers between multiple components. Figure 5 depicts the generation of redundant patterns (Pattern A, B and C) built by the process mentioned above based on the four components in a streamlined way. It is incorrect that the source or destination of the sensitive data tainted as T-data is not the real component that sends or accepts the intent. In Pattern B and Pattern C, the T-data’s source is considered as C2 and C3 respectively, however, the real source and destination is C1 of Pattern A and C4 of Pattern C respectively. Secondly, there are extra patterns generated by the Android for launching the internal components that we are not concerned about. For instance, if the destination component is

The threat patterns of four components.
To address the inflation issue, the Analyzer takes advantage of a deflation technique to eliminate the redundant patterns as follows. The deflation technique can build an ordered pattern list based on the components in the sender and the receiver of a single pattern. It then traverses each pattern to compare the taint tag for identifying the real source in the sender and the destination/sink in the receiver, respectively. In this way, the three patterns in Fig. 5 will be condensed into one, and the Analyzer is able to figure out if the final receiver C4 starts a private component to leak out the sensitive data after receiving the intent. Likewise, the proposed deflation technique also can remove the unnecessary and interfering patterns that come from the Android system’s internal components. For the example mentioned above of
In addition, not only improving the pattern building efficiency, but the patterns deflation technique also helps to handle the multiple apps/components communications based on Algorithm 1 proposed below, e.g., detecting the intent collusion among three or more apps/components. Let the deflation deep be n, denoting the maximum number of components in an ICC path. The case in Fig. 5 can be extended to n components, and the deflation ratio should be

Threat patterns identification
The Analyzer implements Algorithm 1 to identify the possible threat patterns in the ICC patterns. According to the attributes in Table 1, Algorithm 1 considers five different cases, which cover all data leak types we target in this paper. Different cases correspond to different identified rules. Analyzer iterates through each pattern to find the best matching case that has the same attributes.
Data obfuscation resilience
The data obfuscation resilience challenge for SIAT is to handle the malicious ICCs and try to obfuscate or encrypt sensitive data to circumvent the detection. Firstly, as AppFence [21] does, we extend TaintDroid to add tracking for all thirty-eight sensitive data types based on the interleaving taint tag allocation mechanism in the stack frame. Our interface library only provides the ability to add and not set or clear taint tags so that the untrusted functionality can not obfuscate or encrypt data to remove taint tags. Secondly, SIAT employs the retaint operation to track sensitive data, which might be obfuscated prior to being stored in an intent. Based on the implicit control flows analysis shown in Fig. 4 and Fig. 8 below, SIAT can capture a variety of SSL/encryption related operations (e.g.,
The complexity
The complexity of SIAT depends on the number of apps and components at runtime. Since the complexity of Monitor mainly relies on the actual lifetime of the app, here we focus on analyzing the complexity of Analyzer. Assume all feasible ICC patterns between n apps per app contains m components to be analyzed in the SIAT. There are
Evaluations
This section presents the experimental evaluation results of SIAT based on the four datasets below:
DroidBench3.0 [2], which is an app collection for benchmarking ICC-based sensitive data leaks and consists of many types of ICC-related threats.
Droidbench-iccta [1], which has three sets of apps for testing the inter-app collusion issues, and was released by EC SPRIDE Secure Software Engineering Group.
Our Developments, similar to the DroidBench3.0, consist of more than forty self-developed apps that only have simple threat patterns and functions for comprehensive testing. Twenty-six ICC processes also cover at least three components with various sensitive APIs. Concerning efficiency and accuracy, the intent call entries are consistent with the app entries to simplify the call graph, and each app-pair ICC is independent of the other.
Real-World, which contains about 2100 real-world apps3
We have uploaded some typical apps in the link
Our evaluation addresses the following three questions:
To evaluate the accuracy, we compare the SIAT with some state-of-the-art approaches achieving high accuracy in Section 2. They are the well-known runtime technique XManDroid, and two representative static approaches, DIALDroid and AmanDroid. These methods can easily be acquired and mainly focus on revealing data leak-related threats like SIAT.
Accuracy comparisons overview
As depicted in Fig. 6 and Table 2, we employ three performance indicators to evaluate the accuracy:

Comparisons of three accuracy metrics.
Overview of accuracy comparisons between DIALDroid, Amandroid, XManDroid, and SIAT
The ICC path in Table 2 and 3 denotes the malicious ICC path incurring data leaks. ✓ = True Positive, ⊗ = False Positive, ⊙ = False Negative.
The partial results of ICC paths detection in DroidBench3.0, Droidbench-iccta, and our developments. The malicious behaviors of Real-World are listed in Table 4. The ICC paths here are true malicious ICCs, including the recognized ICCs in public datasets and self-developed ICCs
Figure 6 provides an overview of the comparisons of accuracy. Tables 2 and 4 illustrate the details of the results. For simplicity, we merely present some typical detection results. Although we have identified more than twenty malicious ICC paths in our manual analysis, we chose seventy-five apps only covering eight ICC paths. Notably, we have tried to execute the IccTA’s successor named RAICC [35], which boosts the detection by uncovering the atypical ICC methods within the app. Unfortunately, we could not execute RAICC + ApkCombiner [24] on most of the app pairs (over 70%). There are many crashed test cases, and RAICC + ApkCombiner can mainly detect the inter-app leaks in Droidbench-iccta (the same authors of RAICC), e.g., identifying a leak in the source app
Results of DIALDroid. As Fig. 6 shows, the DIALDroid merely obtains 0.64 precision and 0.54 recall in total. The DIALDroid performs static taint analysis to identify attributes of the intent and the intent filter to trace the data flow associated with the intent. Then it uses SQL stored procedures and queries to calculate sensitive channels in the database according to the matching rules between the intent and the intent filter. However, the DIALDroid cannot accurately tell whether the data in the intent meets the requirements of the receiving component. When the data format doesn’t meet the program’s requirements, the sensitive method will not be executed. However, the DIALDroid does not consider it and assumes that the sensitive method must be executed. In addition, DIALDroid treats the case that sensitive data arrives in other applications via intent as a privacy breach, which improves the overall coverage while introducing false positives.
Results of AmanDroid. Similarly, the AmanDroid achieves 0.71 precision and 0.21 recall in that it cannot analyze the complex ICC-based data flow. The AmanDroid cannot analyze the data flows when facing the complex ICC paths, and thus it cannot detect any malicious ICC path in DroidBench 3.0, leading to a lower recall than others.
Results of XManDroid. As shown in Fig. 6 and Table 2, the XManDroid obtains 0.73 recall on Our Developments due to the seven ICC paths suffering from intent spoofing, which the XManDroid cannot identify. Consequently, the XManDroid only achieves a precision of 0.78 and a recall of 0.80 in total. The XManDroid enables users to predefine a list of ICC restriction policies and automatically block ICCs that match any policy. These policies are based on the permissions of the sender and the data in intent. Thanks to its permission identification mechanism, which will not intercept the delivery of intent only if the permissions in the receiver match the ones in the sender, the XManDroid performs well both on Droidbench-iccta and DroidBench3.0, as shown in Table 2. However, when the sender sends out the sensitive data with permission that the receiver doesn’t have, the XManDroid prohibits this ICC directly without considering whether the receiver uses the data later. This case is a common problem in many runtime protection approaches, which raises a high false alarm rate. For example, the experimental results on Our Developments show that even if the receiver does not extract any sensitive data from intent, the XManDroid still thinks there is malicious behavior without identifying the receiver’s behaviors. As a result, it makes the XManDroid detect two false positives ⊗. In contrast, the SIAT traces both of the data flows in the senders and receivers, then analyzes the whole transmission process that enables SIAT to generate fewer false negatives than the XManDroid.
Results of SIAT. SIAT cannot track a few output methods due to the limitation of the built-in TaintDroid. However, compared to the existing approaches, as depicted in Fig. 6 and Table 2, the proposed SIAT can achieve an accuracy improvement of about 25%∼200% with a precision of 1.0 and a recall of 0.98. There are two reasons why SIAT performs much better. Firstly, unlike the DIALDroid and the XManDroid, SIAT traces the data flow in the receiver at runtime by capturing and verifying the data in a sensitive method, which makes SIAT acquire more precise data flows. Secondly, DIALDroid, AmanDroid, and XManDroid do not detect intent spoofing, which is one of the major reasons their precision is lower than ours.
Table 3 shows details of malicious ICC paths of DroidBench 3.0 and Droidbench-iccta, and only ten malicious ICC paths of Our Developments due to the paper limits. The ICC paths in Real-World are given in Section 6.2. The original three ICC paths in Droidbench-iccta are innocent since the receivers can get the device ID from intent by themselves with the related permissions. To make the ICC paths illegal, we delete the permissions for device IDs in the three receivers.
The results in DroidBench 3.0 for DIALDroid are much better than AmanDroid; nevertheless, there still are two deficiencies: The first one is that the DIALDroid cannot identify the malicious ICC path when the type of the component is
Furthermore, as mentioned before, the DIALDroid cannot tell the receiver’s real requirements of the intent data formats, leading to extra false positives (i.e., aforementioned coincidental malicious ICC). For instance, in a Real-World dataset, for app
Indeed, from the software engineering perspective, the dynamic approaches only observe a limited part of the apps covered by the run-time inputs considered. However, a set of three attributes, i.e., action, data, and category, can readily cover most intended ICC execution paths in our evaluation. Therefore, the static techniques DIALDroid and Amandroid have lower accuracy than the other two dynamic approaches when facing complex ICC paths across multiple apps.

Two cases of bypassing in the receiver.
The bypassing is similar to the malware collusion in a way, i.e., two components try to work cooperatively so that each component only performs part of the behavior to bypass the detection. Nevertheless, the main difference is that the two components come from the same application in the above cases of bypassing. Based on extensive experiments and in-depth analysis, as depicted in Fig. 7, in the receiver, we discover the following two undisclosed cases of malicious bypassing, which can invalidate the existing approaches by taking advantage of special intermediate methods/objects:
SharedPreferences. The first case is that, as shown in Fig. 7, in the receiver, if the Component A stores the sensitive data into the

The Component A puts the sensitive data into SharedPreferences
Application. Similarly, component A assigns the data extracted from the intent to the variable of the

The Component B gets the sensitive data from SharedPreferences and then outputs them

Our resolutions. We have successfully realized the above two bypassing cases in Our Developments with destinations named

Identification of original source of tainted data for bypassing
Based on the workflow for monitoring SharedPreferences in Fig. 8 and the identification algorithm of the original sources of tainted data for the bypassing in Algorithm 2, we showcase the solution of bypassing as follows:
Depending on a system-wide real-time tracing of the tainted data, as shown in Fig. 8, the Monitor firstly taints the original sensitive data with a tag. Then retains the data as T-data′ (T-data′ ← T-data) when storing it in an intent. Thanks to the TaintDroid, after being delivered to the receiver with intent, only if the T-data′ is utilized as the parameter of some sensitive function that is used to store data and then output them out of the device, our Monitor will identify the T-data′ and figure out its original source by comparing the taint tag and the predefined source code. Unfortunately, unlike the
SIAT can significantly increase the detection precision of ICC threats via systematic tracing, discovering the real intent usage pattern, and resolving the coincidental malicious ICCs and bypassing cases.
Analysis results on real-world apps
The string-extra here denotes the string
SIAT can monitor and further record the apps’ behaviors at runtime with taint logs. For evaluation, we have to run Real-World applications by adopting the automated testing script to trigger the applications’ behaviors in the system. We need to design and input the corresponding scripts for each app to run the test with monkeyrunner [28], a popular Android tool for running test suites. However, it is time-consuming to handle many apps this way. To save the testing time and effort for the apps without malicious ICCs, we depend on the manual analysis engineering below to find the apps that hold the suspicious ICC paths and then write the corresponding scripts. Finally, we run the scripts and related apps simultaneously in our system to trigger the possible malicious behaviors for analysis.
Manual analysis engineering. We employ manual reverse engineering (dex2jar + jd-gui) to obtain the ICC paths-related source codes for each APK file. Firstly, we go through these codes by combining the sensitive methods defined in SIAT with the static code analysis tool DIALDroid, to investigate if each identified ICC path was indeed malicious. In this way, we have analyzed thousands of suspicious apps and quickly eliminated most of them (about 75%). Afterward, we run these script-driven suspicious apps with monkeyrunner under SIAT. Finally, we have verified that there are ICCs in about 163 application pairs without suspicious behaviors. For all pairs of applications that have ICCs, there is no sensitive data in the ICCs of the 121 applications. Meanwhile, there is intent hijacking in the ICCs of sixteen application pairs, intent spoofing in the ICCs of six application pairs, and malware collusions in the ICCs of four application pairs. Table 4 illustrates several typical unrevealed threats and false positives identified by SIAT.
Specifically, to ensure validity, we double-checked if sensitive methods related to malicious activities had been launched to exploit the vulnerable application successfully by carefully injecting some checkpoints into the Android system to identify the debug outputs. Statistically, we gain an insight that most benign apps (about 84%) tend to use much more ICC paths than malicious apps, and the malicious apps intend to construct more complex ICC patterns to circumvent the detection. It is worth noting that, to avoid subjectivity, two authors have carried out an inter-rate agreement protocol obtaining a Cohen’s kappa of 0.9 or so, indicating an almost perfect agreement. Thus we focus on the ICCs that the two authors agreed on.
Intent hijacking. The
Intent spoofing. After the
Intent collusion. The
False positive cases. Besides, there are two typical cases of false positives in existing approaches, which SIAT addresses in Table 4. The
SIAT can uncover several unrevealed instances of data breach threats in real-world apps and identify several typical cases of false positives in existing approaches.

The comparison of app runtime in various parts between Monitor and Android on different datasets. It is worth noting that we do not select the TaintDroid as the baseline in that the overhead of Monitor is almost the same as TaintDroid. DroidBench3.0 + IccTA denotes the short of DroidBench3.0 + Droidbench-iccta.
We now evaluate the runtime performance of SIAT with the above datasets. The Monitor and the Analyzer are two independent components of the workflow of SIAT. Notably, the actual runtime of Monitor is related to the lifetime of the app. Hence we evaluate their runtime overhead below separately before summing them.
Since our Monitor is a modified version of the Android system at the framework layer, we compute the app runtime cost on our Monitor and the native Android operating system, respectively. We randomly selected dozens of app pairs that can launch various malicious ICC paths from these datasets mentioned above. Also, to achieve the accurate runtime interval, we have inserted related time-stamped recording codes in a variety of crucial APIs in the Monitor and apps. It is worth noting that the runtime cost of each part on Android in Fig. 9 represents the recorded average app execution time at the same APIs after we run apps without the Monitor. We run apps under the Monitor and the Android native operating system, respectively. We divide the Monitor module into four parts according to the five steps in Section 5.3 to calculate and compare the time cost separately.
Figure 9 shows the evaluation results for each part. In Fig. 9(a), the time overhead of Real-World is longer than the others in that the real-world apps maintain more complex and complete functionalities that lead to more ICC paths. Thanks to the lightweight extension scheme, which meets the requirements of SIAT architecture by inserting less than 800 lines of java codes in about forty essential files. Both in the transfer intent and check taint processes, the functions for Monitor almost do not incur any runtime overhead compared with the original Android. The set taint process leads to about 0.3 ms overhead due to the import of TaintDroid functions. The major runtime cost is incurred by the check intent process in Monitor, which exploits the reflective calls to figure out the component that sends the intent. Nevertheless, the overhead is less than 1 ms and is negligible for the end-users.

The time cost of Analyzer under various app-pairs.
Figure 10(a) depicts the total time cost of analyzing multiple app-pairs with our Analyzer as the number of app-pairs increases. The entire time cost increases almost linearly, along with the number of app pairs. Thanks to the pattern deflation technique, the average time cost is less than 100 ms per app pair. The time cost of app-pairs in Real-World rises sharply when the number of app pairs reaches five, owing to some large-size apps that generate more complex patterns. Most of the app pairs from DroidBench3.0 + IccTA and Our Development only have simple structures and functions used for experimental purposes.
To investigate the influence of app size in-depth, we analyze a variety of app size-dependent factors that affect the time cost, such as the number of patterns, the size of log files, the number of codes, and so on. We find that the number of patterns is the most influential factor. In this regard, we carry out an experiment that takes about 13.7 s to analyze a log file containing about 200 patterns (including threat and normal patterns), indicating an average time cost of 68.5 ms per pattern.
Figure 10(b) further illustrates the average time cost of analyzing the logs of 140 different app pairs with the Analyzer in twenty runs. These 140 different app-pairs are randomly chosen from the datasets. The x axis is the serial number identifying every app pair. Consistent with Fig. 10, the time cost of the DroidBench3.0 + IccTA and Our Developments concentrates on the lower time region of less than 60 ms. In contrast, the time cost of analyzing the logs of the app pairs from the Real-World dataset is much longer than others due to a large number of patterns incurred by their sophisticated functions.
Thanks to the sound design strategy of SIAT, the main runtime cost of the Monitor is incurred by the check intent process (⩽1 ms), which exploits the reflective calls to figure out the component that sends the intent. The overall time overhead for processing a single app-pair is less than 200 ms and thus is negligible.
Usability limitations. There are the same usability obstacles of SIAT as some existing approaches. From the software engineering point of view, the users may not accept installing a customized Android with built-in SIAT due to security and usability concerns. Therefore, we would employ the SIAT as a runtime safety guard background for ICCs generating ICC security reports.
Data flow limitations. Implicit control flow is a transfer of control between procedures using some mechanism, which adds flexibility to system design [4]. Unlike traditional tracing of implicit data flows [34], our approach does not deal with covert channels, or implicit data flows. Instead, we only deal with the explicit data flows and the intent-related implicit control flows of the sender and receiver sides in Inter-Component Communications scenarios where privacy-sensitive data leakage may exist.
Subjectivity limitations. The manual analysis for real-world apps is subject to human subjectivity, which has never been well-studied. In our evaluations, we pay more attention to the complex ICC path for disclosing sophisticated malicious purposes. Nevertheless, a Cohen’s kappa of 0.9 indicates an almost perfect agreement between the two authors. Besides, the employed DIALDroid is subject to false positives or false negatives in the cases of Section 6.1.1 and 6.1.3, respectively, and our sensitive methods are identified based on ‘SuSi’ and taint tags. Thus we cannot cover all cases in our manual analysis.
Conclusion
In this paper, we present the design and implementation of the SIAT, which provides real-time systematic tracking of privacy-sensitive data and visibility into how the collaborative malicious behaviors take place via intent for data leaks, based on its two crucial modules: Monitor and Analyzer. With the well-defined built-in intent service primitives for the seminal TainDroid, enabling the cooperation of TainDroid and the extended framework layer of Android at multi-level of intent propagation, SIAT works as a runtime safety guard for ICCs by not only handling the explicit data flows but also the intent-related implicit control flows at both the sender and receiver sides in a systematic perspective, discovering the intent usage pattern and resolving the coincidental malicious ICCs and bypassing cases.
Footnotes
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under Grant No. 61872130, 62122023, U20A20202, 62002167, and 61874042; the Science and Technology Project of the Department of Communications of Hunan Provincial under Grant No.201928; the Hunan Natural Science Foundation for Distinguished Young Scholars under Grant No. 2020JJ2010, the Hunan Science and Technology Innovation Leading Talents Project under Grant No. 2021RC4019, the Key R & D Projects of Changsha under Grant No.kq1907103, the Youth Program of National Natural Science Foundation of China under Grant No.61902121.
