Abstract
Detecting encrypted malware traffic promptly to halt the further propagation of an attack is critical. Currently, machine learning becomes a key technique for extracting encrypted malware traffic patterns. However, due to the dynamic nature of network environments and the frequent updates of malware, current methods face the challenges of detecting unknown malware traffic in open-world environment. To address the issue, we introduce MVDet, a novel method that employs machine learning to mine the behavioral features of malware traffic based on multi-view analysis. Unlike traditional methods, MVDet innovatively characterizes the behavioral features of malware traffic at 4-tuple flows from four views: statistical view, DNS view, TLS view, and business view, which is a more stable feature representation capable of handling complex network environments and malware updates. Additionally, we achieve a short-time behavioral features construction, significantly reducing the time cost for feature extraction and malware detection. As a result, we can detect malware behavior at an early stage promptly. Our evaluation demonstrates that MVDet can detect a wide variety of known malware traffic and exhibits efficient and robust detection in both open-world and unknown malware scenarios. MVDet outperforms state-of-the-art methods in closed-world known malware detection, open-world known malware detection, and open-world unknown malware detection.
Introduction
With the increasing awareness of user privacy and data security, a wide range of applications use encryption protocols such as TLS for network communication. Encryption protocols protect data from interception and modification by encapsulating and encrypting data packets. Among these, the usage of the TLS protocol has exceeded 80% in various industries, which will keep this trend [38] in a long term. Unfortunately, cybercriminals are also using encryption protocols. Attackers use encryption protocols to hide various malicious network activities like command-and-control (C&C) communications. Massive attacks evade inspection by firewalls and intrusion detection systems (IDS) through the use of encryption techniques, posing a challenge to network security. To meet this challenge, it is critical to detect encrypted malware traffic to discover attack activities in a timely manner and prevent further propagation.
There exists a significant amount of studies on encrypted malware traffic detection [13,24,25]. Early studies extract fingerprints from payloads to detect malware traffic. However, these methods have difficulty dealing with encrypted attacks and zero-day attacks. First, encryption technology randomizes the payload, making it difficult to extract fingerprints. Second, fingerprint-based methods can only extract fingerprints from known attacks while zero-day attacks change the original fingerprint information to evade detection. Recent research shows that machine learning can effectively detect encrypted malware traffic and discover zero-day attacks by statistically analyzing traffic and fitting the boundaries between benign and malware traffic. Therefore, machine learning analysis become a widely adopted approach. Based on the granularity of the traffic, we categorize these approaches into packet-based, single-flow-based, and multi-flow-based methods. (1) Packet-based [12,23,32,37]. It detects individual packets by analyzing the statistical and content information of the packets. (2) Single-flow based [11,14,28]. It uses the five-tuple flow as the detection object and builds statistical models or fingerprints to distinguish malware traffic from benign traffic. (3) Multi-flow based [7,19,29]. It applies 4-tuple traffic, host traffic which includes multiple flows as detection object. Similar to the above methods, machine learning is used to fit the boundaries of malware and benign traffic. Since this method fuses multiple flows, it can extract more robust features than packet-based and single-flow based methods.
Despite the progress made, there remain some limitations.
To address the aforementioned challenges, in this paper, we introduce a multi-view encrypted malware traffic detection approach. It integrates multiple views including statistical, DNS, TLS, and business at the granularity of 4-tuple flows, aiming to comprehensively characterize malware traffic behavior. Unlike existing works that generally focus on feature extraction at the level of individual flow or packet, we innovatively propose to fuse multiple flows at the granularity of four-tuple flows. This approach allows for the exploration of behavioral relationships between internal flows and offers a more comprehensive, multi-view driven representation of network traffic, which exhibits stronger generalizability.
To tackle the first challenge, we construct behavioral features and depict malware network behavior using 4-tuple flow as the granularity of traffic. As previous studies [4], mining internal behavior links and the attacking intent of malware traffic through multi-flow perform effective, particularly during malware updates. Compared with traditional granularities such as packet and single-flow, 4-tuple flow can incorporate the context information of flows and represent the internal relationships between flows. This is stable even in the face of frequent updates and variants of malware.
Addressing the second challenge, we only extract the 4-tuple flow within the ten minutes to construct behavioral features, differing from methods that extract features from the entire flow. Given the complete attack process, the infected host may maintain communication with a specific host for several days. However, in the key stages of an attack, malware often generates large volumes of repetitive network activities with a specific host. For example, it might continuously send data packets in an attempt to compromise the host system. Therefore, by analyzing the relationship between connections within a ten-minute interval, a timely and effective characterization of malware traffic behavior can be achieved.
To handle the third challenge, we extract multi-view features, including statistical, DNS, TLS, and business views, providing a comprehensive and rich representation for encrypted traffic. To the best of our knowledge, this is the first method to characterize malware traffic by mining the business type of the upper application layer. While benign traffic can be found across various businesses, malicious behavior exhibits distinct business tendencies. Therefore, it is beneficial to identify the business types of traffic for encrypted malware traffic detection.
Lastly, we evaluate the performance of the proposed MVDet especially the evaluation in open-world by real-world data. Accordingly, we construct an open-world traffic dataset CTU-Real that includes long-term real traffic collected at the gateway and open-source long-term malware traffic generated by various malware in the Malware Capture Facility Project. Specifically, we evaluate the performance of MVDet with various types of malware traffic such as Zeus and Trickbot. Moreover, we evaluate the performance of MVDet in an open-world environment that encompasses a wider range of normal network behavior and unknown malware traffic.
The main contributions of this paper are summarized as follows:
We introduce a multi-view-based method for encrypted malware traffic detection, named MVDet. To the best of our knowledge, MVDet is the first approach that combines multi-view traffic content at 4-tuple granularity for encrypted malware traffic detection.
We first propose that mining the upper layer business types of encrypted traffic aids malware traffic detection, as targets of attacks often exhibit a clear business tendency. Specifically, malware traffic is commonly associated with web and download services.
We present a method for constructing short-time behavioral features. This approach only requires the 4-tuple flow within ten minutes interval to build behavioral features, ensuring both robust feature representation and rapid detection.
We evaluate MVDet by CSE-CIC-IDS2018 [27] dataset and our collected real-world, large-scale malware traffic dataset CTU-Real. Experiments show that the AUC can be improved by 5.47% in the closed-world. In the open-world, the AUC can be improved by 5.8% for known malware detection and 13% for unknown malware detection.
The remainder of this paper is structured as follows. Section 2 summarizes the related work. Section 3 describes the preliminary analysis. Section 4 presents the method. Section 5 shows the experiments and evaluation. Section 6 presents the conclusion.
Related work
In this section, we review related works on malware traffic detection. View analysis is closely related to traffic granularity, and specific granularity will limit the choice of views. According to the traffic granularity, we summarize these studies into packet-based detection, single-flow-based detection, and multi-flow-based detection.
Packet-based detection
Packet-based detection is performed on individual packets. Wang et al. [37] and Hwang et al. [12] propose that packet header field information can effectively distinguish malware traffic from benign traffic. For example, in DDoS attacks, the attacker keeps sending massive ACK requests without waiting for the server response, which causes the ACK value in the packet header to remain constant. Therefore, they use neural networks such as CNN and LSTM to learn the spatial and temporal features of the packet header (the first 54 bytes of the packet) to achieve malware traffic detection. In addition, Stergiopoulos et al. [32] and Mirsky [23] detect packets that behave abnormally by mining the differences in statistical features. They extract statistical features such as packet size, payload size and implied contextual features such as ratio to previous packet size, inter-packet delays, and then they use decision trees or unsupervised models to detect malware traffic.
Packet-based detection performs feature extraction and traffic detection at the packet level without aggregating packets to the flow level. Thus, these methods can achieve high-throughput online detection. However, they consider individual packets as detection object while DNS and TLS information usually obtained in flow level. Therefore, these methods are constrained to statistical view analysis, making it challenging to incorporate content from other views such as DNS and TLS. Furthermore, individual packets are highly unstable under traffic shifts. Alterations in the network environment, application updates, and malware variants can cause obvious differences in packet statistical features, thereby reducing the effectiveness of packet-based detection methods.
Single-flow-based detection
Single-flow-based detection is one of the most common approaches, which considers the flow based on the five-tuple aggregation as the detection object. Single-flow-based detection constructs detectors by analyzing the statistical features and field features of the flow. Sharafaldin et al. [28] and Gohari et al. [11] extract the statistical features such as duration and payload size based on CIC-Flowmeter [17]. Then they use machine learning or neural networks to distinguish malware traffic from benign traffic. In addition, Lee et al. [18] propose that TLS fields are strongly correlated with malware, so they construct the detection fingerprints of malware traffic based on the TLS version, cipher suites. Anderson et al. [1–3] fuse flow statistical features and TLS fingerprints to enhance the detection capability. In addition, Fu et al. [8] also introduce frequency domain analysis into single-flow detection, which can effectively improve throughput and increase detection speed.
Despite many methods that are based on single-flow enhancing their features to encompass multiple views for characterizing malware traffic, they predominantly use single flows as their analysis object. This typically results in some flows like recovered flows missing TLS or DNS information [19]. Meanwhile, these methods usually involve extracting statistical features from complete flows, a process that can be significantly time-consuming, especially with long-duration flows. Moreover, single-flow-based detection faces the problem of poor robustness. Given the dynamic nature of the network environment, maintaining a stable distribution of the flow’s statistical features becomes challenging, which in turn diminishes the performance of detectors. For example, [1] combines multi-view information such as HTTP and DNS, but they are difficult to maintain robustness in open environments by performing feature mining at the five-tuple granularity.
Multi-flow-based detection
Multi-flow-based detection is performed at a coarser granularity by gathering many flows as a whole, such as at 4-tuple flow granularity and host granularity. Since the recovered SSL/TLS connections don’t have a complete handshake process, the certificate features of these connections cannot be extracted. Thus, Li et al. [19] and Shekhawat et al. [29] extract and calculate the statistical features and TLS fingerprint features at the 4-tuple flow granularity, providing more complete information for traffic detection. Despite employing a granularity level of 4-tuple for characterizing flows, they extract features like transport size based on the entirety of these 4-tuple flows. However, they overlook the internal relationships and interactions of 5-tuple flows within the 4-tuple, resulting in a neglect of certain behavioral representations that could potentially be critical for malware traffic detection. To characterize the behavior of malware traffic, Dong et al. [7] mine behavioral features by constructing malicious IAT behavior fingerprints based on packet length sequences at the host granularity, and the behavioral features are more robust than traditional statistical features. However, this method is a fingerprint-based method that cannot detect unknown malware traffic. Moreover, it involves constructing behavioral features based on the entire flows, also resulting in a high time cost.
Multi-flow-based detection methods mine behavioral features from malware traffic, achieving a more robust and comprehensive characterization. However, these methods primarily concentrate on extracting features from a statistical view, thereby neglecting other significant views, such as the DNS view, TLS view, and business view. This oversight limits and effectiveness of their detection capabilities. Moreover, such approaches suffer from the problem of long detection delay, as they require high time cost for multiple flows to converge before initiating feature construction. Meanwhile, deploying these methods in real-world environments poses challenges. First, they require substantial storage capacity owning to the complete multi-flow aggregation. Second, in order to prevent the spread of malicious attacks, managers need to ensure timely detection and mitigation. Consequently, multi-flow-based detection remains to consider the prompt response when adapting to real-world environments.
In summary, previous methods largely focus on packet and single-flow granularity for malware traffic detection. However, these methods limit the potential to mine behavioral correlations between flows and poses challenges in maintaining high performance in open-world scenarios. Although more comprehensive traffic characterization methods based on multi-flow are developed, they come with increased time cost and remain limited to the statistical view, neglecting the TLS, DNS, and business views. To address these limitations, this paper presents MVDet, a novel approach that combines multiple views to analyze behavior and relationships between flows at 4-tuple granularity. It presents a richer characterization of traffic behavior, ensuring robustness and accuracy, even in open-world scenarios that encompass both novel benign network behavior and unknown malware traffic.
Preliminary analysis
To understand the traffic patterns and mine related features that can be used to characterize and detect malware traffic, we perform a preliminary analysis of the multi-view on a small dataset, including statistical view analysis, DNS view analysis, TLS view analysis, and business view analysis.
Dataset
In order to perform the analysis and evaluation of the MVDet, we collect real-world traffic as our dataset. Specifically, we collect real traffic at the gateway of an enterprise network over a period of eight months as benign traffic, and we store image traffic of some randomly devices in a database. In addition, we divide the benign traffic into two datasets, the benign-closed dataset is captured for the first four months, and the benign-open dataset for the final four months. Meanwhile, in terms of ensuring the benign nature of the captured traffic, we implement an IDS to filter out any abnormal traffic.
Moreover, we use Malware Capture Facility Project [33], which captures malware traffic long-term (2013–2018) as the malware traffic. This dataset is widely used in malware traffic detection. It involves various malware types such as Zeus, Cridex, Emotet, Trickbot, and many attacks, including DDoS, brute force guessing, and C&C communication. To evaluate the capability of MVDect under unknown malware traffic detection, we divide the malware traffic into two datasets, including malware-known dataset and malware-unknown dataset. Malware-known dataset contains the traffic generated by six types of malware, and malware-unknown dataset contains the malware traffic generated by other malware. Finally, we construct our collected dataset CTU-Real, including benign-closed (capture size is 28.3 GB), benign-open (capture size is 45.6 GB), malware-known (capture size is 25.9 GB), and malware-unknown (capture size is 95.6 GB).
For the preliminary analysis, we randomly select a small subset of the benign-closed and malware-known datasets to avoid bias in the final evaluation.
Statistical view analysis

The statistical view between the top 8 active C&C servers and the infected host from Zeus [9].
In this section, we use Zeus [9] as an example to analyze traffic behavior and summarize the statistical view of malware traffic. A complete network attack process includes various operations such as website visits, executable file downloads, and C&C communication, all of which are likely to be encrypted. Figure 1 presents the statistical view between the top eight active C&C servers and the Zeus-infected host. We use the client’s packet number to denote the statistical behavior of each connection. Our analysis reveals that the duration between the infected host and a specific C&C server can hold several days or even a month. Furthermore, during the communication with the specific C&C server, there are massive similar connections, indicating a strong similarity in network behavior between flows. For instance, during communication with 194.28.87.64, the infected host consistently sends seven or eight packets to the server until the connection expires. While there are few dissimilar communications, they merely represent discrete points at the 4-tuple granularity and do not affect the overall internal similarity of the 4-tuple flow. Note that other statistical features such as flow size, duration, and inter-arrival time also exhibit flow similarity, even though we only display the client’s packet number.
We observe that malware traffic presents two special features in the statistical view. One is that the external malicious IPs can be valid for several days. Another is that the communication between the malicious server and the infected host is periodic and internally similar. Therefore, unlike regular flows, we employ the 4-tuple flow as the object of malware traffic detection, which groups the multiple connections between the infected host and a specific malicious server as a whole, enabling to summarize the behavioral features of attack activities.

The DNS domain analysis for both malware and benign traffic.
In this section, we explore the differences between malware traffic and benign traffic with regard to DNS query and response. We correlate the 4-tuple flow with the corresponding DNS query and response information. Figure 2(a) displays the DNS query statistics for both malware traffic and benign traffic. We find that 99% of the malicious 4-tuple flows do not perform DNS query before connection. This is due to the fact that malicious IP addresses are ofen hard-coded during malware creation, allowing infected hosts to directly connect to a specified IP list for malicious activities. Moreover, as illustrated in Fig. 2(b), we notice that the repetition rate of query domains for benign traffic and malware traffic does not exceed 1%. We also extract a blacklist of parent domains from malware traffic and match it with the parent domains of unknown malware traffic. We find that more than 10% of the blacklisted parent domains are still active during unknown malware detection. Therefore, the DNS domain can serve as a more stable short-term blacklist compared to dynamic IP addresses.
In addition to domain analysis, we investigate the TTL value, which is the survival time of DNS records on DNS servers. As shown in Fig. 3, we find a significant difference in the distribution of TTL settings between malware traffic and benign traffic. Specifically, attackers usually set smaller TTL values in order to quickly change the C&C servers. We observe that 299 and 300 are the malware’s two most frequent TTL values. Moreover, advanced malware also sets higher TTL values because they don’t change the IP address for weeks.
From the DNS view, we find that malware often communicates directly with the C&C server through specified IPs, resulting in numerous malicious connections without DNS query. Furthermore, malware significantly differs from benign traffic in parameter settings such as domain and TTL. Thus, we suggest that DNS features can effectively distinguish malware traffic from benign traffic.

The distribution of TTL settings among malware traffic and benign traffic.
In this section, we analyze the TLS fingerprints of encrypted communication by malware. During the TLS handshake, the Client Hello and Server Hello messages provide initialization attributes such as SSL version, cipher suite, and extensions. As developers build client applications using different packages and methods, this makes the Client Hello attributes generated by different applications unique and identifiable [16,35]. Thus, the Client Hello message can serve as a fingerprint for the client application. Similarly, Server Hello messages can also be employed as valid fingerprints for server application identification. Currently, JA3/JA3S fingerprints, which construct TLS fingerprints by extracting the attributes of Client Hello and Server Hello messages, are commonly used in application identification. Like benign applications, malware and C&C servers also have identifiable TLS fingerprints. Therefore, even if the traffic is encrypted and the IP address and domain are unavailable, we can still detect malware traffic by matching the JA3/JA3S fingerprints of the malware.

The distribution of JA3/JA3S among malware traffic and benign traffic.
However, a specific JA3/JA3S fingerprint is not uniquely related to an application, that is, the JA3/JA3S fingerprint can match multiple applications. Therefore, it is common to cause false alarms when matching malicious traffic with JA3/JA3S fingerprints. In addition, JA3/JA3S fingerprints will no longer be valid when new malware emerges or attackers modify malware TLS configuration attributes. We collect JA3/JA3S fingerprints of some known malware and compare them with JA3/JA3S fingerprints of benign traffic and other unknown malware, as shown in Fig. 4. As mentioned above, the JA3/JA3S fingerprints of malware are duplicated with the JA3/JA3S fingerprints of benign applications. In our survey, 23.5% of JA3 fingerprints and 39.2% of JA3S fingerprints are not the unique fingerprints of malware, which will cause false alarms for malware traffic detection. Furthermore, 6% of new malware can be identified by collected known malware JA3 fingerprints and collected known malicious JA3S fingerprints can identify 18% of new malware.
From the TLS view, we suggest that JA3/JA3S fingerprints can identify known malware traffic and partially unknown malware traffic. However, due to the high false alarm rate, JA3/JA3S fingerprints should be used to characterize malware traffic together with other view features.
In this section, we analyze the differences between malware traffic and benign traffic from a business view. The business view refers to the identification of the services (such as browsing, email, streaming) and applications (such as Google, Facebook, Twitter) to which the traffic is related. Based on our investigation, we find that different malicious behaviors show a clear tendency towards specific businesses. For example, DDoS attacks are commonly found in online games and ISPs, while ransomware attacks predominantly target financial banking [6,10]. Furthermore, email and file services are susceptible to encrypted phishing attacks, and attackers often transmit malicious content through file sharing. Thus, while encryption techniques are utilized in various attacks, the targets of these attacks show a distinct business tendency.
We count the services associated with malware traffic as shown in Table 1. The malware traffic is primarily concentrated on web, download, cloud, and system services. Conversely, traffic associated with services such as media, chat, and shopping tends to be benign. Additionally, we count the applications to which the malware and benign traffic belong, as shown in Fig. 5. There is a difference in the distribution of malware and benign traffic in terms of associated applications. Specifically, nearly 50% of the traffic is still linked to benign applications. Also, while 78 applications are counted in the malware traffic, more than 75% of the malware traffic still cannot be identified in terms of the applications they belong to. This indicates that a larger proportion of malware traffic belongs to unknown application traffic.
The top 5 service among malware traffic
The top 5 service among malware traffic

The distribution of application among malware traffic and benign traffic.
The business types are not directly extracted from flow content, but there are many studies in encrypted traffic business classification, with methods such as nDPI [5], FlowPrint [36], and ET-BERT [20] achieving high performance. These methods aim to classify various service types and applications by analyzing traffic fingerprints and deep patterns. In this paper, we utilize the open-source nDPI tool to classify traffic business types. nDPI is an effective real-time traffic classification tool that can identify many service types and applications, making it a popular choice in network management.
From the business view, malware traffic has a clear tendency towards specific services, being mainly found on the web and download platforms. Additionally, application identification tools are unable to detect a large proportion of malware traffic, implying that traffic from unknown applications presents a significant concern. Therefore, the business view can provide effective features for traffic detection to characterize malware behavior.
In this section, we introduce the design of MVDet. First, we describe the overview. Then, we describe the process of MVDect, including traffic collection and aggregation, 5-tuple feature extraction, 4-tuple feature generation and malware traffic detection.
Overview

The framework of MVDet, including 1. Traffic collection and aggregation, 2. 5-tuple features extraction, 3. 4-tuple features generation and 4. Malware traffic detection.
In this paper, we propose a multi-view encrypted malware traffic detection method, MVDet, The framework is shown in Fig. 6. It characterizes malware traffic behavior from four types of views: statistical view, DNS view, TLS view, and business view, thereby achieving accurate and robust malware traffic detection. Unlike traditional methods, we propose feature mining at the granularity of 4-tuple flows to comprehensively characterize malicious behavior.
To extract traffic features, we first collect and aggregate online traffic. Specifically, we design a 4-tuple table that is updated daily, storing 4-tuple flows with duration up to ten minutes as offline traffic. We only extract the 4-tuple flow within ten minutes instead of the entire 4-tuple flow, which detects the malicious activities in time and stops the further spread. Subsequently, we execute 5-tuple flow feature extraction for offline traffic. The extraction of 5-tuple features enables us to gather detailed information about individual network flows, providing an overview of a connection. We extract statistical, DNS, TLS, and business features of 5-tuple flows and calculate the 5-tuple features associated with the same 4-tuple flow to generate 4-tuple features, including both abstract and extension features. This process enables us to capture critical network interactions and behavioral features.
In 4-tuple features generation, we generate extension features in two steps. Initially, we arrange these 5-tuple features into sequences based on the start time of the flow, such as packet length sequences and port sequences. Next, we categorize these features into numerical and string features. For numerical features, we compute the maximum, minimum, mean, and variance of the feature sequence. For string features, we calculate the number of items, the most frequent item, and its ratio. Alternating between 4-tuple and 5-tuple feature extraction is an innovative approach to capture both granular flow information and aggregate behavior, thereby enhancing the robustness and comprehensiveness of our detector. Ultimately, we use machine learning to model the 4-tuple features for malware traffic detection.
In conclusion, MVDet captures behavioral features from multiple views at the granularity of 4-tuple flows. It offers a more comprehensive characterization of malware behavior and is less affected by the environment, thus ensuring more stable detection in open-world environments.
In this paper, we construct multi-view features using the 4-tuple flow within the first ten minutes in 24-hour epoch. The first step is traffic collection and aggregation. Since the validity of the server varies from a few days to several months, we consider the 4-tuple traffic within 24 hours to have the same label. Thus, we create and maintain a 24-hour epoch table to records the meta information and timestamp. Specifically, this table stores the 4-tuple information, including source IP, destination IP, destination port, and transport layer protocol and the start timestamp.
The 4-tuple flow within first ten minutes refers to the time frame starting from when the first packet of a specific 4-tuple is recorded, and including all packets belonging to that 4-tuple within the subsequent ten-minute interval. Therefore, when the online packet arrives in the 24-hour epoch, we first search the 4-tuple table. If the 4-tuple meta information of this packet is not recorded in the table, its meta information and arrival timestamp are recorded in the table. If the meta information of the packet exists in the table, we save the packets within ten minutes and discard them otherwise, which are the direct object of malware traffic detection. When 24 hours are exceeded, the 4-tuple table is reset to an empty table.
DNS, TLS and business information directly associate with 5-tuple flows. Meanwhile, the internal similarities within a 4-tuple flow are represented through the relationships between the 5-tuple flows within the same 4-tuple. Therefore, in order to extract multi-view features, we first perform 5-tuple feature extraction. In this section, we describe the extraction of statistical view features, DNS view features, TLS view features, and business view features.
5-tuple feature extraction
Statistical view features
Statistical view features
There tends to be an inherent similarity in flows between the victim host and the attacking host. For instance, Zeus exploits a controlled host to conduct DDoS attacks on a specific server, leading to the generation of numerous flows exhibiting similar behavior. At the traffic level, it is observed that the statistical features of the 5-tuple flows within the same 4-tuple display similarity. Thus, we extract the 5-tuple statistical features to generalize and characterize flow behavior. Finally, we select the 11 most significant statistical features from the 5-tuple flows, as presented in Table 2.
DNS view features
Numerous malicious connections access servers directly via specified IP addresses without DNS query. It can distinguish them from benign traffic in the distribution of DNS query. Moreover, for malicious connections with DNS information, the DNS domain can maintain a stable blacklist in the short term. Concurrently, DNS settings such as the TTL of malware traffic differ from that of benign traffic. As a result, we count the DNS information of both malware and benign traffic, extracting five features from domains and DNS settings, as depicted in Table 3.
DNS view features
DNS view features
In TLS-encrypted communication, it is required to complete key negotiation during the handshake stage. In particular, the Client Hello and Server Hello messages provide some initialization attributes for encrypted communication. Since developers build client applications using different packages and methods, the Client Hello and Server Hello attributes generated by different applications are somewhat unique and recognizable. Therefore, it is significant to generate the application or malware fingerprints through TLS attributes. Currently, JA3/JA3S are the most popular TLS fingerprints, and many studies have employed them for application identification and malware detection. JA3 consists of the following fields in the Client Hello message: SSL version, ciphers, list of extensions, elliptic curves, and elliptic curve formats. JA3S consists of the following fields in the Server Hello message: SSL version, ciphers, and SSL extensions. Therefore, we extract the JA3/JA3S fingerprints as the TLS features, as shown in Table 4.
TLS view features
TLS view features
Business view features
Business view features
In our analysis, malicious activities are widely present in web and download services, and a large amount of malware traffic belongs to unknown applications, which leads to a large difference in service and application distribution between malware traffic and benign traffic. Meanwhile, there exists a large number of works that can effectively identify the service and application of encrypted traffic [21,31,39]. Therefore, we partly determine whether the traffic is malicious by identifying the service and application as the business features. In this paper, we use the nDPI tool to provide business features for malware detection, as shown in Table 5. Note that while we effectively distinguish malware traffic from benign traffic using nDPI, more effective results can be achieved with the help of more advanced business identification tools.
In this paper, we summarize and characterize network behavior at the granularity of 4-tuple flows, which are more comprehensive and robust than 5-tuple flows. We use the extracted multi-view 5-tuple features to generate 4-tuple features. Specifically, we converge the 5-tuple features of the same 4-tuple into feature sequences. Then, we gain the 4-tuple abstract features based on the 5-tuple features, including 5-tuple flow counts, client data size, server data size and total data size. In addition, we generate extension features based on the type of 5-tuple features. We divide the features into numeric features and string features. For numeric features, we calculate their maximum value, minimum value, mean and variance. For string-based features, we calculate the number of terms, the most frequent item and its ratio. After this process, each 4-tuple flow is characterized by multi-view features. This is the direct input to the malware traffic detector.
Malware traffic detection
Recent studies show that machine learning detects malware traffic effectively [25,30]. By comparing with various machine learning algorithms, we fit the boundary between malware traffic and benign traffic using the most effective LightGBM [15]. LightGBM supports efficient parallel training and has faster training speed, lower memory usage, and better accuracy. It is widely used in industrial practice because of the following advantages [26,34]. Firstly, LightGBM adopts the histogram algorithm to transform the samples into histograms, which greatly reduces the time complexity. Secondly, LightGBM adopts the one-sided gradient algorithm to filter out the samples with small gradients during the training process, which reduces much computing. In addition, LightGBM adopts the growth strategy based on the Leaf-wise algorithm to build the tree, which reduces unnecessary computation. Thus, in this paper, we suggest that LightGBM is also applicable to the task of traffic detection with massive traffic and fast response. It is worth noting that our detector is only used to distinguish malware traffic from benign traffic and does not perform a specific classification of malware traffic.
Experiment and evaluation
In this section, we perform experiments and evaluations to verify the performance of the MVDet. Firstly, we describe the dataset, metric, and baseline methods. Secondly, we validate and analyze the proposed method in closed-world known detection, open-world known detection, and open-world unknown detection. Finally, we conduct an analysis on selected features and an analysis on time window.
Dataset
Datasets used in each detection scenario.
denotes a portion of the traffic in the dataset. For benign traffic, it denotes randomly sampled traffic, and for malware traffic denotes traffic of a particular type of malware.
denotes all the traffic in the dataset, ◑ and
denotes the random division of the dataset into two unintersected parts
Datasets used in each detection scenario.
We conduct the evaluation of MVDet using a combination of the publicly available CSE-CIC-IDS2018 [27] dataset and our own collected dataset CTU-Real. The CSE-CIC-IDS2018 dataset is a collaborative project between the Communications Security Establishment (CSE) and the Canadian Institute for Cybersecurity (CIC), serving as a common benchmark for malware traffic detection. It encompasses a variety of typical attack scenarios such as Brute-force, Botnet, DoS, DDoS, SQL Injection, and internal network infiltration. However, while the dataset captures 10 days of both malware and benign traffic within an implemented network topology, it remains somewhat limited in the diversity of attack tools, lacking various mutations and updates. Consequently, we restrict our evaluation of MVDet to closed-world known detection scenarios using solely the CSE-CIC-IDS2018 dataset.
Aiming to assess the performance of MVDet under more realistic environmental conditions, we also use CTU-Real, including benign-closed, benign-open, malware-known, and malware unknown datasets to perform the evaluation of MVDet. The benign-closed dataset is captured for the first four months, and the benign-open dataset for the final four months at the gateway of a enterprise network. The malware-known dataset covers six types of malware traffic and malware-unknown dataset covers other malware traffic of Malware Capture Facility Project. More details are in Section 3.1, and we use different combination of datasets to evaluate the performance in different scenarios, which is shown in Table 6.
We evaluate the proposed method using TPR, FPR, and AUC. TPR and FPR are defined as follows.
AUC provides an aggregate measure of performance across all possible classification thresholds. One way of interpreting AUC is the probability that the model ranks a random positive example more highly than a random negative example.
Baseline methods
In this section, we describe the baseline methods used to evaluate the MVDet. We selected Kitsune and Joy as baseline methods considering the granularity of the traffic, real-time capability, and feature dimensionality.
Kitsune [23]: It is a popular packet-based method. Kitsune builds feature vectors for each incoming packet by tracking each network channel and using unsupervised self-encoders to distinguish malware traffic from benign traffic.
Joy [2]: It is a popular single-flow-based method. Joy builds a classifier for malware traffic detection using random forests based on the flow meta information, byte distribution matrix, Markov chain of packet length and packet time, and TLS Client Hello cipher suite and extension features.
The result and comparison on closed-world known malware detection (CSE-CIC-IDS2018 dataset)
The result and comparison on closed-world known malware detection (CSE-CIC-IDS2018 dataset)
In this section, we evaluate MVDet’s ability in closed-world known detection by CSE-CIC-IDS2018 and CTU-Real datasets. In addition, we apply a ten-fold cross-validation for the datasets to obtain stable results. In our ten-fold cross-validation, the dataset is randomly divided into ten equal subsets. The cross-validation process is then performed ten times, with each iteration using a different subset as the test set and the remaining nine subsets combined to form the training set. This methodology ensures that every data point in the dataset is used for both training and testing, providing a comprehensive and robust evaluation of the model. Meanwhile, to understand the contribution of MVDet, we perform a comparison with Kitsune and Joy. We evaluate the detection ability by TPR, FPR, and AUC. Note that in Joy and Kitsune, the traffic of the same 4-tuple is divided only into the training set or the test set, which more closely reflects real-world scenarios.
Table 7 shows the result of closed-world known detection in CSE-CIC-IDS2018 dataset. In this case, Kitsune needs to set the threshold to detect malware traffic manually. For comparison with MVDet, we set the threshold to satisfy the value of
Table 8 shows the result of closed-world known detection in CTU-Real dataset. We find that the TPR of all malware in MVDet exceeds 0.96 except Dridex. Meanwhile, the FPR of all malware is below 0.01 except Miuref. It indicates that MVDet shows outstanding detection capability. In comparison with the baseline methods, we observe that MVDet outperforms both Kitsune and Joy. The average AUC value is 1.93% higher than that of the best baseline method. Thus, the experimental results confirm the effectiveness of MVDet under closed-world known detection.
The result and comparison on closed-world known malware detection (CTU-real dataset)
The result and comparison on closed-world known malware detection (CTU-real dataset)
In this section, we evaluate the capability of MVDet in open-world known detection using benign-closed, benign-open, and malware-closed datasets. In this scenario, we divide the dataset differently from the traditional random division. Specifically, for the training and validation sets, we use the benign-closed dataset as benign traffic and 50% of the randomly selected 4-tuple flows from malware-closed as malware traffic. For the test set, we use the benign-open dataset as the benign traffic and the remaining 50% of the malware-closed traffic as the malware traffic. In addition, since Kitsune is an unsupervised detector, which is more in line with open-world unknown detection in the following, we only use Joy as the baseline method for comparison in this section. Since our experiments cannot be applied to cross-validation, we randomly sampled the test set and repeated it 20 times to get more reliable detection results.
The experimental results are shown in Fig. 7. We observe that MVDet still maintains an average TPR of more than 95%, an FPR of less than 10%, and an AUC value of 97.25% in open-world known detection. It indicates that the multi-view features mined by MVDet at the 4-tuple granularity are more robust in characterizing malicious behavior. In comparison with Joy, the TPR values of all malware are higher than Joy except Dridex, and the average TPR of MVDet is 2.12% higher than Joy. Meanwhile, the AUC values of all malware are higher than Joy, and the average AUC of MVDet is 5.8% higher than Joy. However, in terms of FPR, MVDet has four types of malware higher than Joy, and the average FPR value of MVDet is slightly higher than Joy. It indicates that MVDet is special in mining the malware behavioral features, while the behavior of benign traffic is not regular. In other words, benign traffic generated by different applications or different actions has great behavior differences, so it is difficult for MVDet to summarize the benign behavior. For this reason, in practice, we can improve the FPR value by adjusting the threshold value higher. Finally, the experimental results confirm the effectiveness of MVDet in open-world known detection.

The result and comparison on open-world known detection.

The result and comparison on open-world unknown detection.
In this section, we evaluate the capability of MVDet in open-world unknown detection. For the training and validation set settings, we employ the benign-closed dataset as the benign traffic and the malware-closed dataset as the malware traffic. For the test set, we employ the benign-open dataset as the benign traffic and the malware-unknown dataset as the malware traffic. Since our experiments cannot be applied to cross-validation, we randomly sampled the test set and repeated it 20 times to get more reliable detection results.
We show the performance of the different methods in the open-world unknown with the AUC values, as shown in Fig. 8. We find that MVDet achieves an average AUC value of 80% while Kitsune only 47% and Joy only 67%. Thus, the AUC value of MVDet is still 13% higher than the best baseline method. In addition, the detection results of all these methods are varied by the network environment. The variation of MVDet is only within 2%, which is the smallest. However, the variation of Kitsune is more than 44%, and Joy is more than 10%. Thus, it shows that MVDet still obtains the best results in open-world unknown detection with high robustness and low influence by the environment. Finally, these experimental results confirm the effectiveness of MVDet in open-world unknown detection.
In this section, we analysis on the selected features. We show the top 10 features of known malware detection and the analysis on multi-view of unknown malware detection.
The top 10 features

The top 10 features on known malware detection.
In this section, we analyze the importance of features in the detection of the six known malware types, as shown in Fig. 9. Different malware attacks in different ways, resulting in various behavior patterns at the traffic level. Specifically, the importance of different views takes up different weights depending on the malware. For example, in Zeus detection, the most frequent items of the application in the business view have the highest percentage. However, in Emotet detection, the maximum of server bytes sequence in the statistical view has the highest percentage. As a whole, the top 10 features involve the statistical view, DNS view, TLS view, and Business view, which indicates that all of our proposed multi-views are effective for malware traffic detection. In addition, we find that among the multi-views, the statistical view has the highest feature importance.

The AUC value on different view features. S denotes the statistical view features, D denotes the DNS view features, T denotes the TLS view features, and B denotes the business view features.
In this section, we analyze the impact of the proposed multi-view on unknown malware detection. In the above section, we find that all four views have different importance for different malware. Therefore, we compare the AUC values with different view combinations, as shown in Fig. 10. We observe that the combination of multi-view improves 6.9% over the single statistical view. Meanwhile, dropping any view degrades the detection results. It shows that all four views we propose are effective for malware traffic detection, and the combination of the four views achieves the best detection results.

The detection result of analysis on time window.
In this section, we analyze the impact of time window on detection. In this paper, we construct multi-view features by extracting only the 4-tuple flows within the first ten minutes in specific 24-hour epoch, which can detect malicious activities in time to reduce the larger threat. ten minutes is the optimal parameter for our proposed time window. In Fig. 11, we compare multiple time windows and find that when the time window is set to ten minutes, MVDet can satisfy both
In this paper, we propose multi-view encrypted malware traffic detection. MVDet constructs behavioral features from the statistical view, DNS view, TLS view, and Business view based on the 4-tuple flow, which can comprehensively and robustly characterize the malware traffic behavior. It shows outstanding detection results in both open-world and unknown malware detection. In addition, we reduce the time of feature extraction by setting a time window to detect malicious activities in time and reduce the propagation in the early stage.
However, in an open environment, benign traffic has variable behavior under the influence of applications and network actions. It is difficult for MVDet to summarize benign traffic comprehensively. Therefore, in future work, we plan to extract more comprehensive and robust benign traffic features to optimize the capability of open-world detection.
Footnotes
Acknowledgments
This research is supported by the Strategic Priority Research Program of Chinese Academy of Sciences (No.XDC02040100), and the National Key Research and Development Program of China (2021YFB3101400). This work is partially supported by NSFC (No.61902376), and the National Engineering Research Center of Classified Protection and Safeguard Technology for Cybersecurity (C21640-3). This work is also supported by the Program of Key Laboratory of Network Assessment Technology, the Chinese Academy of Sciences, Program of Beijing Key Laboratory of Network Security and Protection Technology.
