Abstract
Advertisements are the fuel that runs many online services such as websites or mobile apps, but also adversaries started to abuse ads for financial gains. Nowadays, online advertising companies track users all over the web in order to create successful online ads campaigns specifically tailored for a target audience. A popular phenomenon on the Internet, so-called adware, abuses online advertisements by maliciously injecting or replacing ads on websites. As many consider ads to be quite privacy intrusive, much work has gone into studying the effects of online advertisements on users’ privacy. However, only little work has been done so far into analyzing the privacy implications of adware.
In this work, we shed light on the capabilities, mainly concerning tracking and personal data exfiltrating, of adware and potentially unwanted programs (PUPs), at scale. To this end, we capture the communication of adware/PUPs in the Firefox browser on the application level to circumvent lower-level encryption (e.g., TLS). Using this framework for capturing the network traffic, we dynamically analyze the communication of over 16,000 adware or potentially unwanted program samples. We find that around 37% of requests issued by the analyzed samples contain some kind of personal information. Furthermore, we identify the services used by adversaries and provide insights on the used tracking techniques.
Introduction
Web browsers became one of the most important types of application and they are especially relevant for providing us access to modern web applications for specific tasks such as e-mail, spreadsheets, or image editing. Furthermore, browsers enable us to interact with each other, share ideas, or even give access to a broad variety of multimedia content such as music or video streaming services. Due to this broad usage of browsers, they mediate a massive amount of personal data which also increases over time. Consequently, adversaries started to target the browser ecosystem with novel attack vectors (e.g., banking fraud or man-in-the-browser attacks). Especially malicious software that tampers with the browser session, such as potentially unwanted programs (PUPs), adware, and malicious browser extensions, pose an important threat for users today. These new threats include injection or replacing ads on websites which is an easy way for adversaries to make a financial profit [30].
First and third party user tracking is a commonly known part of the business model of most websites and other online applications (e.g., mobile apps) [1,4,5,11,22,24,28]. Personal data (especially clickstream data) is collected and analyzed in order to create behavioral user profiles that can be used for targeted advertising [7]. As malware starts to use ads for personal gains, questions about the privacy implication of this kind of malicious software arise.
While similarities in these topics exist there are technical and motivational differences why users are being tracked. On a technical level, users do not give consent or know about the installation – and therefore the tracking – of adware and PUPs, in contrast to websites where tracking is commonly known. As adware and PUPs have rich access to the users’ device and can therefore access data inside and outside the browser, websites and extensions are limited to the browser. Once an adversary infected a system, she can easily track every step of a user e.g., by injecting a tracking object into every visited website. Thus, the adversary can silently create comprehensive user profiles that might contain highly sensitive personal data which might be of high value for some ad companies.
On the contrary, on a motivational level websites want to enhance the users’ experience while browsing their site (e.g., by suggesting products the users might like). To do so, they monitor the users’ behavior on the website and try to interfere the users’ interests. Browser extensions might collect personal data in order to provide their service to the user (e.g., an extension might collect the users’ passwords to store them in a protected vault which can be used on different devices). However, the motivation of malware to exfiltrate personal data is purely malicious. For adversaries, classical Internet-related fraud (e.g., credit-card fraud) is increasingly difficult due to new security measures. Thus, adversaries look for new ways to make a profit (e.g., ransomware or ad injection). Behavioral profiles of users or data to build such profiles (e.g., clickstreams) can be sold to third parties [21]. Selling such profiles might be another way for adversaries to make a profit.
In this work, we examine this phenomenon on a larger scale and report on privacy leakage and user tracking of PUPs and adware. To the best of our knowledge, we are the first to study this topic on a larger scale for both adware and PUPs, and we focus on privacy implications of malware in general. More specifically, we address unnoticed privacy implication of adware and PUPs in this paper. Our results show that adware and PUPs mainly exfiltrate clickstream data of users which provide great insight into the personal, digital life of their victims. More than a quarter of the analyzed malware samples (27% of the analyzed adware and 30% of the analyzed PUPs) leak the full URL visited by the users. Additionally, we show that the leakage of personal data is a significant part of the malicious behavior of the analyzed malware. We also identified popular data sinks used by the analyzed adware and PUPs samples which are often located in Asia. Previous work found the high prevalence of adware and PUPs [19], which shows that this leakage of personal data is a considerable threat to all Internet users.
In summary, we make the following four contributions in this paper:
We present a framework that is capable of capturing the network traffic emitted by a given browser on the application level (Section 4), which allows us to analyze traffic of any software that tampers with the users’ browser session.
We provide detailed insights into the impact of PUPs and adware on users’ privacy. In our measurements, over 45% of all analyzed adware and PUPs samples leaked personal data or tracked users (Section 4.3). To the best of our knowledge, we are the first to report on data leakage and profiling by adware and PUPs on a large scale.
We identified (1) the services used to track users, (2) the websites most commonly tracked, (3) and data predominantly exfiltrated by adware or PUPs (see Section 5.1).
Finally, we present an analysis on objects not used to track the user or to leak personal data (e.g., images and style sheets) (Section 5.2).
This paper is an extended version of the paper: “Towards Understanding Privacy Implications of Adware and Potentially Unwanted Programs” presented at the European Symposium on Research in Computer Security (ESORICS) 2018 [32].
Background
In this section, we explain the terms adware, potentially unwanted programs, and browser extensions. Further, we give a brief overview of the adware ecosystem and describe several tracking mechanisms.
Adware, potentially unwanted programs, and browser extensions
In this work, we analyze two different types of software, namely adware and potentially unwanted programs (PUPs). We further analyze browser extensions to assess our results and to make them more comparable to other related work (e.g., [18,28,30,34]). In the following, we explain these types of software and discuss how we understand them in the scope of this work:
Potentially unwanted programs (
In this work, we examine the negative privacy implications of adware and PUPs and compare these findings to extension downloaded from the Firefox Add-On repository [12]. In the past, adware or PUPs could come in form of an extension but due to policy changes of Firefox one can only install extensions present in their repository. This is probably why none of the analyzed samples successfully installed an extension. We focus on the negative privacy impact of adware and PUPs but also give hints regarding the “ad injection” and “search query redirection” capabilities of the analyzed samples (see Section 5).
As just defined, adware and PUPs have similar capabilities, and therefore it is reasonable to analyze both and compare them to each other. In order to make our results more comparable to previous work, we additionally analyzed browser extensions which are well explored regarding their (malicious) behavior. Of course, adware has more access to the operating system and could, therefore, come along with many other malicious capabilities than browser extensions. Therefore, we analyze the outbound network traffic that is not emerging from the browser (“second channel”) to examine privacy breaches on that channel, too.
Adware ecosystem
The focus of this work lies in the analysis of privacy implications of adware and PUPs. The adware ecosystem is presented in Fig. 1: (1) The user’s system is infected with software (i.e., adware, PUPs, or extensions) that tampers with the browser session. (2) The extensions, PUPs, and adware inject their (malicious) objects (e.g., JavaScript code, or images) into the visited website. These objects might be used to load some content from a third party (e.g., ads), or might exfiltrate private information about the user. Many parts of the ecosystems are already well explored (dotted lines). In this work, we analyze the privacy implications of adware an PUPs for users (dashed lines). To the best of our knowledge, there has been no research analyzing this part systematically on a large scale.
The main monetization technique of adware (as the name hints) is injecting ads into websites and getting paid based on the payment model of the ad-network (e.g., pay-per-view) (3). Nevertheless, authors of adware, PUPs, or malicious extensions might also sell private data they exfiltrate from their victims [6] (4).

Overview of the adware ecosystem. The adversary infects the victim’s device with malicious software which insert ads into a visited website. After displaying the ads, or a click on the ad by the user, the adversary gets paid typically by a an ad network.
Tracking mechanisms can be subdivided into stateful and stateless tracking methods. Stateful tracking identifies users through a unique identifier chosen by the tracker. On the contrary, stateless tracking tries to determine users through properties of the users’ device or browser (e.g., installed fonts or drivers).
Two exemplary stateful tracking techniques are explained in the following:
A web beacon (sometimes called tracking pixel or web bug) is often not larger than
Third party cookies are a popular way to track users across different servers. In contrast to first-party cookies, which are set by the currently visited website, third party cookies are set, e.g., by content loaded from the third party by the visited website. However, third-party cookies are set for the same reason than standard first-party cookies so that a visited website can identify a user later on.
Two examples of stateless tracking are browser and canvas fingerprints:
Browser fingerprinting enables website providers to recognize and identify a user’s system by unique properties of each browser. Eckersley demonstrates that a combination of browser and device features can almost uniquely identify most users on the web [10]. Web-based browser fingerprinting is, therefore, a conventional technique that has been investigated by several other researchers [10,15,20,23]. This technique can further be abused for customization of displayed products, e.g., recently Hupperich et al. showed that the location plays a role in the price offered for hotel bookings [16]. Canvas fingerprinting is possible by abusing the HTML canvas element, that was introduced in HTML5, to draw graphics onto websites. Mowery and Shacham demonstrate that it is feasible to use for user tracking [22].
Related work
In this section, we discuss work closely related to ours and explain how our approach relates to previous work on this topic.
Adware & malicious add-ons
Jagpal et al. [17] present
Thomas et al. [30] combine Hulk and WebEval to measure the effect of malicious extensions on the websites google.com, amazon.com, and walmart.com. They report that 5% of the daily unique IP addresses visiting google.com are infected with malware that injects ads into websites.
Neither
Analysis about fingerprinting on the web
In a large-scale study, Acar et al. examine three advanced web tracking mechanisms (canvas fingerprinting, evercookies, and cookie syncing) [1]. According to their study, 5% of the top 100k websites use canvas fingerprints to identify users.
In 2010, Ashkan et al. conducted a study on the use of Flash cookies [27]. 50% of the websites in their set (Alexa top 100 sites [3]) use this kind of cookie mostly without disclosing this in their privacy policies. Note that since May 2011, all EU countries adopted a directive which says amongst others that websites have to display a “warning” to users if they use cookies [25].
Englehardt and Narayanan [11] present the most recent study on online tracking. They introduce the open-source measurement tool
The introduced work measures the tracking capabilities and other privacy implications of modern websites. In this work, we analyze the exfiltration of private data and user tracking by malware, i.e., adware and PUPs.
Prevalence of potentially unwanted programs
The prevalence and distribution of PUPs are examined by Kotziaset al. [19]. By analyzing AV telemetry, Kotzias et al. show that around 54% of 3.9 million analyzed hosts have PUPs installed. Furthermore, they found that the top PUP publisher ranks 15 among all software publisher (benign or not). They analyze the PUP-malware relationship and conclude that PUP and malware distribution is independent from another.
The pay-per-install (PPI) ecosystem is analyzed by Thomas et al. [31]. The authors show that PPIs sell access to the users’ systems for prices ranging from 0.10$ to 1.50$ per installation. Furthermore, they show that PPI services take a considerable part in distributing PUPs. Based on Google Safe Browsing telemetry, they show that PUPs are downloaded three times more often than classical malware. Both works show the massive prevalence of PUPs but do not investigate the influence this type of software has on the users’ privacy.
Privacy implications of browser extensions
The privacy diffusion enabled by browser extensions is examined by Starov and Nikiforakis [28]. They dynamically analyze the privacy leakage of extensions available for the Google Chrome browser. They find that a non-negligible amount (6.3%) of the top 10,000 extensions leak privacy-sensitive data. To counter the leakage, they design
The most recent work in this field of research is written by Weissbacher et al. [34]. The authors present a prototype implementation called
The work of Starov and Nikiforakis is to some extent comparable to our work but, due to the nature of their analysis framework, does not cover tracking capabilities of extensions and does not look for exfiltrated metadata (e.g., user-agents or passwords). In [28] the software is analyzed that might need some personal information to successfully run their service (e.g., to identify malicious URLs). In contrast, we focus on malware that exfiltrates data in a purely malicious manner which foreshadows that there is a clear distinction between these two types of software. On a technical level we extend the findings of [28] by (1) identifying all exfiltrated data, (2) showing that there is a significant difference in type and amount of exfiltrated data, (3) identify websites to which visits are primary tracked, (4) analyzing the tracking behavior of malware, (5) determining the tracking services used by different malware families, and (6) identifying the used tracking techniques.
Approach
In this section, we introduce our framework, describe its working principles, inform about our analyzed data set, and give an overview of the investigated samples. Note that in contrast to most related work, due to the application-level monitoring, our system can even inspect HTTPS traffic, can find private data in encoded and deflated content, and allows a stateful analysis.
Framework
We developed a framework (see Fig. 2) that allows us to (1) perform a stateful analysis of each sample, (2) capture, if needed decrypt, decode and analyze HTTP(s) communication on application level, and further (3) collect and analyze all network traffic not emerging from the browser.
The general workflow of a single analysis run goes as follows. The analysis slave pulls and installs an adware sample, PUP sample or extension from the server (1). Afterward, the slave visits a predefined set of websites (2a) and logs the resulting communication. To do so, we developed a browser extension that captures all network traffic on the application level. Since we save the traffic on the application level, we can inspect all requests and responses before or after they are encrypted or decrypted, by the TLS layer. After visiting a website, we wait for 30 seconds so it can finish loading and the analyzed software has time to inject content into the site. Additionally, we record all traffic on network level that is originated from aside the browser (2b). We cannot decrypt the traffic apart from the browser. Thus in our analysis, we are limited to the unencrypted traffic. At the end of the analysis run, the plain HTTP(s) traffic and the further communication is sent to the server for review. Before the analysis we – if needed and possible – inflate (e.g., gzip) and decode (e.g., BASE64) all data (see also Section 4.3.1).
In this work, we perform a stateful analysis which means that the used browser has properties that a mock browser or a default state would typically not show (e.g., a browsing history or cookies). If one wants to analyze the tracking capabilities of the software, it is inevitable to perform a stateful analysis because resetting the state of the browser during the investigation of a sample might disable some mechanisms that are used for tracking (e.g., cookies). The clean installation state of our slaves – that is recovered after each restart – has a browsing history, several cookies set, passwords in the browser’s password vault, and other properties that are usually set when using a browser. Note that most prior work performs a stateless analysis of ad-injectors or browser extensions [2,18,30]. Only
To conduct a representative analysis, we need to learn the regular communication of a website to distinguish between requests regularly issued by the site and requests issued by an object injected by the adware, PUP, or extension. We collect the non-malicious regular communication of a website for our analysis by visiting all sites with an analysis slave – but without installed sample or browser extension.
Since websites tend to load dynamic content from various and often changing sources, each slave collects new reference values after analyzing two samples.All collected reverence values are combined to one reference set

Overview of our developed framework for the dynamic traffic analysis of adware, PUPs and browser extensions.
We used the global Alexa Top 100 [3] (as of 01/15/2017) as the basis for our set of websites which are visited by the analysis slaves. We restricted our analysis to unique hostnames from this list (e.g., we only analyze google.com even if google.co.uk is on the list as well) because we assume that the communication would be similar.
After filtering the sets consists of 57 domains. We added five popular e-commerce domains (e.g., bestbuy.com) because we expect the adware or PUPs to be more active on e-commerce websites, which turned out to be true for PUPs but not necessarily for adware (see Section 5. For each of those domains, we chose two subsites either randomly by visiting the domain and selecting two links, or if possible by selecting the most popular subsites for this site (e.g., products).
A more detailed overview of the set can be found in Appendix A. In total, the analysis of each sample takes around 70 minutes (including booting, infection, visiting the 128 websites, waiting 30 secs., etc.). Previous work either visited a broad set of websites once to conduct their analysis (e.g., [11]), used some mock pages to analyze the injected content (e.g., [18]), or did not disclose how many sites they visit (e.g., [17]).
For our analysis, we used 8,536 distinct adware samples (referred to as

Distribution, on a logarithmic scale, of the analyzed malware sample families. One adware family (Dealply) is dominant in our set while the rest is more or less balanced – which allows us to generalize our results.
We used samples that were submitted to VirusTotal [33] between 01/01/2017 and 12/20/2017. VirusTotal shut down their API in August and ever since then provides a data set for researchers on Google drive that is updated monthly. The used samples are either identified to be a potentially unwanted program (PUP) or adware by the anti-malware engines used by VirusTotal. We used samples with these labels because we expect that those samples will primarily exfiltrate private data and inject content into websites. To better assess our findings regarding adware and PUPs and to make our work more comparable with previous work, we analyzed the top 5,500 Firefox extensions (
In the following, we focus on analyzing the communication of adware and PUPs. More specifically, we analyze the used tracking services, exfiltrated information, and tracked websites. Additionally, we compare these findings to privacy leakage of the browser extensions we analyzed and with results of previous work. A website can implement a Content Security Policy (CSP) as a defense mechanism to mitigate certain types of attacks like cross-site scripting or data injection attacks. During our analysis, we found that only 17 subsites use CSPs.
Exfiltrated personal information
In this work, we consider information to be private if it holds: (1) data that can be used to identify the client (e.g., IP-addresses), (2) can be used to create a user profile (e.g., visited URLs), or (3) contain sensitive data stored on the computer (e.g., passwords). We consider a website to be a tracker (or tracking service) if it gathers data that can be used to identify users or create profiles about them.
We identified the exfiltrated data by analyzing the transferred cookie, or data sent via the HTTP body. Individual headers can be used to gather personal information about the user (e.g., the user agent or user’s preferred language), but these headers are commonly set by default. Hence, we cannot measure if the analyzed sample utilizes these fields. Before analyzing the fields we, if possible, deflate (e.g., gzip/deflate) and decode (e.g., BSAE64) them. If possible, we repeat this process in case fields are encoded or inflated multiple times, as observed by Starov et al. [28] (e.g., base64_enc(base64_enc(url_enc(<data>)))).
After the inflating and decoding, we perform a keyword matching to determine whether a request is used to leak private information. We identified the keywords by manual inspection of several requests issued by the different analyzed samples. We used 13 keyword categories that on the one hand are commonly used to identify or track users (e.g., screen resolution or installed fonts) and on the other hand information that is specific for our analysis setup (e.g., IP addresses or passwords). Some categories are identified by multiple keywords others just by one (e.g., the password is equal for all machines all the time while the user agent varies from sample to sample). We found 15,462 keywords in the analyzed requests. A manual inspection of a sample of the requests we identified a small (less than ten requests) to be false negatives (e.g., a keyword in a seemingly random string – AR5
Most commonly leaked personal information
Most commonly leaked personal information
To summarize, we consider a request to have negative privacy implication if and only if (1) it is part of
In this section, we provide an overview of the results of our analysis. Throughout this section, if not stated otherwise, we only consider requests used to track users or leak personal data to third parties.
In total, we analyzed 16,645 malicious software samples (8,536 adware samples and 8,109 PUPs) and 5,500 Firefox extensions. We analyzed about 850 GB (compressed JSON data) of generated adware/PUP traffic. 45% of the adware samples, 40% of the PUP samples, and 45% of the Firefox extensions inject content into a website that issued requests to domains not present in
We found that the adware and PUP samples issued 21,429 requests to domains not present in our reference dataset, an increase of 10%. 61 of the adware samples changed the home page of the browser, and 221 changed the browser’s standard search engine or redirected search queries. In contrast, only 6 PUPs changed the home page, but still, 180 replaced the default search engine. Due to Firefox policies, Firefox extensions cannot change these attributes.
Privacy aspects
In this subsection, we present the results of the analysis of the HTTP(s) traffic emerging from the browser. Remember that our framework allows to (1) analyze all traffic in plain text – no matter if HTTPs was used or not – and (2) tries to deflate and decode all data before the analysis (e.g., HTTP GET parameters).
Websites that were actively tracked by the analyzed samples (Alexa Ranks as off 11/30/2017)
Websites that were actively tracked by the analyzed samples (Alexa Ranks as off 11/30/2017)
Table 2 displays the top websites to which visits were actively tracked by the analyzed samples. We consider a website to be tracked if the analyzed sample injects content that can be used for tracking (e.g., a web beacon), or if an observed outgoing request contains any personal information. In our set of websites, each site is tracked by at least 1.5% of the adware and PUP samples. These samples circumvent the CSPs used by websites.
It is notable that the extensions and adware focus on popular websites (e.g., Youtube or Instagram) from different categories while PUPs predominantly focuses on shopping sites. This indicates that PUPs try to understand what a user plans to buy while adware is gathering information that gives a broader overview of the users habits since they track more general websites as well as shopping sites. Accordingly, this allows providing targeted ads for individual persons, making these kinds of information valuable for ad-companies. Overall, way fewer extensions exfiltrate personal information (31.64%) compared to adware and PUPs (46.41%).
Our results show that user tracking is a significant part of the malicious behavior of adware and PUPs. Almost 40% of the request issued by the adware samples, and 35% of the requests issued by PUPs contain personal information or may be used to track users (e.g., they include the visited URL: shady.com/?url=google.com%2Fiphone%2B6). In contrast, only 28% of the requests are used by the extensions for those purposes.
Leaked personal information
To measure the privacy impact, we first identify the transferred personal information triggered by the tested samples. We analyze the transferred cookie, and data sent in the HTTP body requests. Furthermore, we inspect if a response contains JavaScript that is used for stateless tracking or if the answer includes a web beacon.
As described in Section 4.3.1, after deflating and decoding, we perform a keyword matching to determine whether a request leaks personal information usable for tracking mechanisms or not. Table 1 shows the results of that matching. Table 5 displays the third parties receiving the personal information. Note, if a request contains multiple keywords, we count the request numerous times.
In general, compared to PUPs, extensions and adware focus on meta information (e.g., language, time, IP address, etc.). The visited domain is exfiltrated by all analyzed software types alike (∼32%) while PUPs and adware predominately exfiltrate the full request URL (domain and GET parameters). However, one can argue that some extensions transfer this information as part of their service (e.g., an extension that checks if the users visit a malicious website will naturally send the current URL to a third party). In contrast, adware or PUPs leak personal data in a malicious manner or because the used ad services requires the current URL. In either way, the user’s privacy is undermined unnoticed and without the user’s consent. Table 1 shows that PUPs and adware, in contrast to extensions, focuses on the user’s clickstream (i.e., browsing history). This is a more significant threat to the user privacy due to the detailed information leaked users’ personal life (e.g., habits).
We can not identify any privacy-related information in about 6.9% of the requests issued by adware and PUPs (e.g., cdn.gigya.com/JS/gigya.js?apiKey=3_GL3L[...]) and 56% of the requests did not contain any data we analyzed (e.g., code.jquery.com/jquery-2.2.4.min.js).
To the best of our knowledge, there has not been any report on privacy breaches of adware and PUPs. Our measurements show that a significant part, more than

Stolen data by adware / PUP family.

Top tracking services used by the analyzed adware (A) and PUP (P) families.
Figure 4 shows which personal data is predominately collected by the most prominent malware families in our data set, along with the scaled amounts of samples leaking such data. To increase readability, we only listed services used at least seven times by any family and the top 16 malware families individually and combined all other families to Other. All families exfiltrate clickstream data (i.e.,
Figure 5 displays the tracking services used by the different malware families, along with the scaled numbers of appearance. To increase readability, we only listed services used at least nine times by any family and again only the top 16 malware families. Agent, Dealply, the most common adware families in our dataset, and InstallCore, the most common PUP family in our dataset, are using a broad variety of tracking services One can see that TaboTabo and MMStat are overall the most common services used to track users. taobao.com is operated by Zhejiang Taobao Network Ltd., while mmstat.com is operated by Alibaba Co., Ltd.. Both two big Chinese players in the Internet landscape. The third most common observed tracker, GoogleVideo, is a content delivery network – which is also a known tracker – used to host video or sound files.
Table 3 displays the most common services to which privacy-related information is leaked or which provide tracking tools (e.g., web beacons). Only one service gathers additional information about the client’s system aside from the domain. All, but one, tracking services are operated by “big players” based in China. The analyzed extensions tend to use tracking services operated by American companies (e.g., Google or Facebook). Our results show that the services used by Firefox extensions are comparable to Google Chrome extensions [28].
Top tracking services used by the analyzed adware and PUP samples, leaked information, and domain owners
Top tracking services used by the analyzed adware and PUP samples, leaked information, and domain owners
In total, only 151 different trackers were used while 60 trackers where used by only three or fewer samples. This hints that adware and PUP authors tend to rely on existing infrastructure rather than setting up their own (in contrast to C&C communication structures of botnets). Among the observed tracking services, there is no indication for any preferred service. The top 20 services are used on average by 7.48% (
Along with the findings that ad-injection targets users in South Asia, and South East Asia [30] our results show that adware also uses services based in Asia. The usage of these services is understandable because access to big American tracking services (e.g., Facebook or Google) is not possible since they are blocked in China and other Asian countries [14].
Tracking techniques used by the analyzed adware and extensions. The vast majority tracks the users in a different way (e.g., by leaking the URL to a third party)
Table 4 presents the tracking techniques utilized by the analyzed samples – only requests are listed that are used for a specific tracking technique. Previous work shows that stateless tracking is becoming more common on popular websites [11]. However, the analyzed adware samples and PUPs do not utilize stateless tracking techniques. This behavior is comprehensible since the samples can manipulate every website the user visits and therefore can inject a stateful tracking object into each site. Thus, they do not have to rely on more complex and error-prone stateless tracking techniques.
Our analysis shows that web beacons are the most common tracking method among all analyzed samples (adware, PUPs and browser extensions). This result is reasonable since they are easy to implement and are not as easy to block as third-party cookies. It is notable that extensions do not as often use web beacons but utilize 3rd party cookies more commonly.
The results indicate that user tracking is less critical to adware and PUP authors than exfiltrating personal data. But one can argue that exfiltrating the visited URL or domain is also a form of tracking. Requests that contain personal information but do not follow a specific tracking scheme are not considered (e.g., A request contains personal information and loads a picture bigger than a typical web beacon is not counted). The vast majority (around 88%) of requests that impact the users’ privacy leak personal information.
Non-browser emitted communication
Since the full communication of the extensions is captured on the browser level; this section only considers the adware and PUP samples. The analysis in this section includes all (adware/PUP) samples even if they did not insert any object into a website.
Similar to the analysis of the traffic emitted by the browser, we used the communication of
Domain analysis
In this section, we present the results of our domain analysis for the domains in
Only one domain is flagged to be malicious by the Google Safe Browsing API [13]. The domain was used by a pop-up window and is flagged to be used for social engineering attacks. Additionally, we used the Web-Of-Trust (WOT) API [21] to assess these domains. Five domains were blacklisted and six rated to be malicious by the WOT community. In total, 11% of the domains received a “negative” or “questionable” rating. Only slightly more than 8% of all domains are flagged to be used for tracking. This indicates that the services used by the adware are not commonly known for their tracking capabilities. Our findings show that the used services are not outright malicious and mostly serve legit purposes. However, the adware uses these services for different purposes (e.g., data exfiltration) than the desired purpose. Thus, these domains are not flagged (e.g., an Amazon bucket) to be used for tracking. Therefore, it is not always unambiguously decidable – on the domain level – if a domain is used for tracking.
Furthermore, we analyzed the registrars and organizations for these domains. GoDaddy is the most prominent registrar for the inspected domains and is used by both malware and extensions. In terms of registered domains, GoDaddy is the world leading registrar [9]. MarkMonitor Inc. is the second largest domain registrar in our data set. MarkMonitor focuses on enterprises who are interested in protecting their brand online. As for the organizations, most companies did not state their name or used a proxy company for the registration (e.g., Domains By Proxy or, Perfect Privacy). This applies to domains used by extensions as well as domains used by adware and PUPs.
Further communication
After analyzing the requests used to track users or to leak private information, we now analyze the requests used for other purposes (e.g., ad-injection).
Attacked websites
Table 5 lists the top websites into which objects (that issued requests) were injected that had no direct privacy implications. Examples for objects that might not issue requests are inline JavaScript or images embedded in BASE64 encoding. A distribution of the responses’ content types can be found in Section 5.3. Our results indicate that the adware and PUPs circumvents the CSPs used by some websites.
In contrast to the tracked websites, the top attacked websites cover a broader field of categories. More than half (60%) of the websites into which the adware or PUP injected content are hosted in Asia indicating that malware authors tend to target that market. In contrast only 30% of the top websites into which extensions injected content are hosted in Asia. Previous work also observed that users affected by ad-injecting often live in South America, South Asia, and South East Asia [30].
Different adware samples and extensions seem to target different websites (the amount of samples injecting objects into specific websites is quite low), while PUPs samples seem to target similar websites (note that the amount of samples targeting a website is quite high). In contrast to adware, the extensions focus on American/European websites. This is probably due to the low popularity of the Firefox browser in Asia [29]. mall.360.com was superseded by i360mall.com and thus the ranking dropped significantly during the course of our analysis.
Top websites into which objects – that were not used to track the user or used to leak data – were injected
Top websites into which objects – that were not used to track the user or used to leak data – were injected
The distribution of the observed MIME-Types of adware and PUP communication – that did not contain privacy related information – is shown in Fig. 6 (bar chart) along with the sizes of the responses (according to the Content-Length HTTP header field) and the share how often these sizes were observed (violin plot). One can see that adware predominately loads JavaScript code (e.g., third party libraries), HTML code(e.g., websites displayed in an <iframe> tag), or other textual content (e.g., JSON objects). We used simple heuristics (e.g., we checked if the text starts with a <html> tag) to determine if the textual content contains script or HTML code and counted it towards the respective category, if necessary. If it comes to HTML code, we measured that the content is either (almost) zero (e.g., an empty frame) or between 1 kb and 100 kb big.
In contrast, extensions and PUPs only load very little textual content, but excessively load new style sheets or fonts. Furthermore, it seems that PUPS and extensions inject less visible content into websites (i.e., images and HTML objects). This indicates that PUPs prioritizes user tracking over content injection (e.g., ad-injection)

Amount of observed response MIME types of requests (bar charts) that did not contain private information and sizes of the responses with the corresponding distribution (violin plot).
In the following, we discuss ethical considerations and limitations of our work.
Ethical considerations
Running live malware samples always comes with some ethical issues. On the one hand, one wants to understand how malware works in a realistic environment but on the other hand, running malware might result in harming individuals not involved in the analysis process (e.g., via credit card fraud). Since we run malware that generates revenue by displaying ads and stealing private information we eventually created some income for the malware authors during our analysis. We implemented measures to decrease the potential harm a sample can cause (e.g., by limiting the upload bandwidth to minimize their participation in a possible DDoS attack).
To do so, we block some well-known ports which are not necessary for our analysis (e.g., the IMAP port 143). Furthermore, we limit the upload bandwidth of each analysis slaves which will decrease their participation in a possible DDoS attack. Of course, these measures will not prevent all possible attacks, which is probably impossible. We did not see any indications, based on network traffic statistics, that our analysis bots took part in a DDoS attack. During the course of our analysis, we only received one security alert when one malware sample scanned our internal firewall.
Limitations
Our developed framework allows the dynamic analysis of software that tampers with the users’ browser session. However, it comes, like most dynamic approaches, with some limitations. Using a predefined set of websites leaves the risk that the analyzed software does not get active on the visited websites (e.g., banking-malware might only get active on specific subsites of a particular banking site). However, previous work has shown that the top-ranked pages trigger a lot of malware samples and extensions [2,18,28,30]. Also, some samples might only inject content into websites only if certain search words appear, as shown in [30]. Since we use a predefined set of websites and therefore predefined keywords, we will not see injections related to other keywords.
Currently, our analysis slaves do not interact with the websites in a way a real user might (e.g., scrolling, or clicking on links). Some malware samples might only trigger if an event occurs, if the user interacts with a website we missed this kind of behaviour.
Since we are using a virtual environment to execute the malware, some samples might recognize that they are being analyzed. We took several measures to hide that the malware is executed on a virtual machine (e.g., changing CPU information and some registry keys). However, a malware sample might still detect that it is being analyzed and show a different behavior.
Future work
In this work, we only focused on two types of malicious software, adware, and PUPs. Future work should measure how other types of malware collect and use personal data. For example, there are reports on ransomware that threatens users to publish personal data unless they pay the ransom [8].
Our work only considers desktop applications (i.e., the browser). However, as more and more users use mobile and IoT devices, the privacy implications of malware tailored for such devices should be addressed in future work.
Conclusion
Our results show that not only websites and browser extensions but also – on a massive scale – adware and PUPs negatively impact the user’s privacy. We analyzed over 16,000 adware and PUP samples towards their privacy implications to the user. Our results illustrate that these kinds of software excessively leak private data (e.g., IP addresses or clickstream data). More than 37% of all requests issued by malware or PUPs is used for one of these two purposes. Adware and PUPs mainly focus on the user’s clickstream which holds sensitive personal information and may give great detail of the user’s life ranging from e.g., habits, personal preferences to political views. Thus, adware is a not negligible threat to the user’s privacy especially because the leakage happens without consent or knowledge of the user. Regarding the tracking behavior PUPs and adware are quite similar and, since they heavily focus on the users’ clickstream, pose a far worse threat to the users’ privacy than extensions do.
We could show that while there are – regarding the privacy influence – similarities between extensions and adware/PUPs there are also apparent differences. Adware and PUPs mainly focus on the users’ clickstream and can, therefore, create comprehensive profiles of users’ which are valuable to different companies (e.g., ad-networks). Furthermore, our results show that adware and PUPs do not adopt state of the art tracking techniques.
Footnotes
Acknowledgments
This work was partially supported by the Ministry of Culture and Science of the German State of North Rhine-Westphalia (MKW grant 005-1703-0021 “MEwM”) and partially supported by the German Federal Ministry of Education and Research (BMBF grants 16KIS0395 “secUnity” and 01IS14009B “BD-Sec”). We would like to thank the anonymous reviewers for their valuable feedback.
Set of websites
The websites used in our analysis are listed in Table 6. We used the Alexa top 100 as the basis for the set which is described in detail in Section 4.2.
The set consists of ten search engines, 20 social media sites, 11 online-shops, 5 domains hosting adult content, and 16 domains that do not fit in any of these categories (e.g., github.com or cnn.com). 34 of the domains are hosted in the USA, 14 are hosted in the China, four in Russia, three in the Netherlands, two in Ireland, and five sites are hosted in different countries in Asia (ROK, SVR, JPN, HKG, and TWN).
Recorded communication
Listing 1 is an example of a request and response pair captured by our framework. The information is saved in JSON format to simplify the evaluation. We record the HTTP headers, HTTP method, HTTP body, response status, cookies, measure the size of an image (if possible), etc. If the response contained text (in the shown example it includes an image), it would be recorded as well.
