Abstract
The Darknet is a section of the internet that is encrypted and untraceable, making it a popular location for illicit and illegal activities. However, the anonymity and encryption provided by the network also make identifying and classifying network traffic significantly more difficult. The objective of this study was to provide a comprehensive review of the latest advancements in methods used for classifying darknet network traffic. The authors explored various techniques and methods used to classify traffic, along with the challenges and limitations faced by researchers and practitioners in this field. The study found that current methods for traffic classification in the Darknet have an average classification error rate of around 20%, due to the high level of anonymity and encryption present in the Darknet, which makes it difficult to extract features for classification. The authors analysed several quantitative values, including accuracy rates ranging from 60% to 97%, simplicity of execution ranging from 1 to 9 steps, real-time implementation ranging from less than 1 second to over 60 seconds, unknown traffic identification ranging from 30% to 95%, encrypted traffic classification ranging from 30% to 95%, and time and space complexity ranging from O(1) to O(2 n ). The study examined various approaches used to classify traffic in the Darknet, including machine learning, deep learning, and hybrid methods. The authors found that deep learning algorithms were effective in accurately classifying traffic on the Darknet, but the lack of labelled data and the dynamic nature of the Darknet limited their use. Despite these challenges, the study concluded that proper traffic classification is crucial for identifying malicious activity and improving the security of the Darknet. Overall, the study suggests that, although significant challenges remain, there is potential for further development and improvement of network traffic classification in the Darknet.
Keywords
Introduction
The process of classifying the different kinds of data that are sent across a network is referred to as network traffic classification [1]. This process is important for a variety of reasons, including security, performance monitoring, and resource allocation. One particular area of interest in network traffic classification is the dark net, also known as the dark web [2]. It is frequently linked to illegal activities, including the trade of illicit products and the distribution of illegal drugs, but it is also used for legitimate purposes such as anonymous communication and privacy protection [3, 4]. Due to the nature of the dark net, it is difficult to accurately classify the traffic that is transmitted over it. This is because the data is encrypted, and the sources and destinations of the traffic are often hidden. There have been several approaches to classifying dark net traffic, including machine learning algorithms, statistical analysis, and packet inspection. These methods have had varying degrees of success, but there are still challenges in accurately identifying the types of data that are transmitted over the dark net. A systematic approach to evaluating the efficacy of these methods is to perform a literature review that encompasses all the available studies on the subject of network traffic classification in the dark net [5]. This involves collecting and analysing research studies that have been published on the topic and evaluating their strengths and limitations. A review of this nature can help identify the most effective methods for classifying dark net traffic as well as areas where further research is needed. It can also provide insights into the current state of knowledge on this topic and highlight potential areas for future research. Figure 1 portrays the network traffic classification methods.

Network Traffic Classification Methods.
Classification of network traffic is a vital aspect of network management and security, as it allows for the identification and management of different types of network traffic. The dark net, also known as the dark web, is a collection of hidden networks and websites that are not indexed by search engines and are only accessible through specific software or configurations. The classification of network traffic in the dark net is particularly challenging due to the anonymity and encryption used by users. Port-based classification involves identifying the port number used by a network packet to determine its type. This method is simple and efficient, but it may not be able to accurately classify encrypted or tunnelled traffic. Payload-based classification involves analysing the contents of a network packet to determine its type. This method is more accurate than port-based classification, but it can be computationally expensive and may not be able to classify encrypted traffic. Statistical-based classification involves analysing statistical features of network traffic, such as inter-arrival times and packet sizes, to determine its type. This method is efficient and can be used to classify encrypted traffic, but it may not be as accurate as payload-based classification. Machine learning and deep learning are advanced methods for classifying network traffic. These methods involve training a model on a dataset of labelled network traffic to identify patterns and classify new traffic. These algorithms are quite accurate and can be used to categorise encrypted communication, but they need a substantial amount of labelled information and computational resources.
The scientific novelty of this research can be summarised as follows: Comprehensive summary of the latest advancements in darknet traffic classification methods: This study provides an extensive assessment of the state-of-the-art techniques employed to recognise and analyse data transmitted over the Darknet network, including machine learning, deep learning, and hybrid methods. This comprehensive summary of the latest advancements is an important contribution to the field. Evaluation of challenges and limitations faced by researchers and practitioners: The study examines the various challenges and limitations faced by researchers and practitioners in this field, which is an important contribution to the field as it can guide future research and development. Experimental evaluation of Darknet traffic classification methods: The study presents experimental results that show the limitations of current methods for traffic classification in the Darknet, with an average classification error rate of around 20%. The study also shows that deep learning algorithms were effective in accurately classifying traffic on the Darknet, despite the challenges faced by researchers and practitioners. Identification of the importance of proper traffic classification for identifying malicious activity and improving the security of the Darknet: The study highlights the importance of proper traffic classification for identifying malicious activity and improving the security of the Darknet, which is an important contribution to the field as it can guide future research and development.
Overall, this research contributes to the advancement of darknet traffic classification methods and provides important insights into the challenges and limitations faced by researchers and practitioners in this field.In conclusion, classifying network traffic on the dark web can be quite demanding due to the secrecy and encryption employed by its users. Several approaches, including port-based, payload-based, statistically-based, machine learning, and deep learning, have been suggested to address this problem, each with its own advantages and limitations. To achieve the best performance, a combination of different methods may be used.
This paper’s remaining sections are structured as follows. Section 2 provides a comprehensive scientific examination concentrating on contemporary challenges and resolutions in network traffic planning and models. Section 3 discusses some important concepts and approaches related to traffic monitoring. Section 4 introduces an accessible database designed for traffic applications. Section 5 explores the measures that can be utilized to evaluate the efficiency of various methods. Section 6 examines the problems and future work.
The Dark Net, also known as the Dark Web, is a vast and largely unregulated online network that is accessed through encrypted channels. It is often associated with illegal activities such as the sale of drugs and weapons, human trafficking, and cybercrime. Despite the negative connotations, the Dark Net is also home to a variety of legitimate purposes, such as anonymous communication and the sharing of sensitive information. In recent years, the challenge of categorising darknet traffic has received a lot of attention. The process of determining the purpose of the data that is being sent over a network and identifying its type is referred to as network traffic classification. This is important because it allows network administrators to monitor and control the flow of traffic, ensuring the security and efficiency of the network. Table 1 represents the related work in the machine learning-based traffic classification scheme.Previously, identifying traffic and determining its purpose was a straightforward task. When network traffic patterns became more complex and utilised non-standard or well-known protocol port numbers, a novel method was developed that entailed analysing the payload to discover application traffic. However, this strategy also had its shortcomings. Modern approaches therefore employ statistical or behaviour-based techniques to alleviate the limitations of traditional techniques. The remaining sections of this study examine the merits and disadvantages of various traffic classification methods.
Related work in Machine Learning based Traffic Classification Scheme
Related work in Machine Learning based Traffic Classification Scheme
Network traffic can be identified and sorted into distinct categories using a port-number-based classification system [6]. This method is typically used in firewalls, routers, and other network devices to control and manage the flow of traffic on a network. The process of port-based traffic classification begins with the identification of the port number associated with a particular packet of traffic. This can be done by examining the header of the packet, which contains information about the IP addresses as well as the port numbers. Once the port number has been identified, the traffic is then classified based on a predefined set of rules. These rules can be based on a variety of factors, such as the IP addresses and the port number itself. For example, traffic that is sent to or received from port 80, which is typically associated with web traffic, may be classified as "web traffic" and given a higher priority than other types of traffic. Similarly, traffic that is sent to or received from port 22, which is typically associated with SSH traffic, may be classified as "secure traffic" and given a higher priority than other types of traffic. Once the traffic has been classified, it can then be managed and controlled using a variety of techniques, such as filtering, blocking, or prioritising. This allows network administrators to control the flow of traffic on their network and ensure that critical applications and services are given the resources they need to function properly [7]. Overall, port-based traffic classification is a powerful tool for managing and controlling network traffic and is widely used in a variety of network devices and applications. So, it is very important to understand and use it properly in order to have better network performance. The algorithm 1 iterates through each packet in the network and extracts the destination port number from the packet header. It then checks if the destination port number is in a predefined list of web ports, SSH ports, or email ports. If it matches one of these lists, the packet is classified as web traffic, secure traffic, or email traffic, respectively. If it does not match any of these lists, it is classified as general traffic.
Payload-based traffic classification
The technique of payload-based traffic classification involves the identification and classification of network traffic by analysing the packet or frame payload’s content [9], as shown in Algorithm 2. This method is typically used in network devices such as firewalls, intrusion detection systems, and deep packet inspection (DPI) devices to control and manage the flow of traffic on a network [10]. The process of payload-based traffic classification begins by examining the content of the packet or frame payload. This can be done by using a variety of techniques, such as regular expressions, string matching, or machine learning algorithms. These techniques are used to extract features from the payload and identify patterns or characteristics that can be used to classify the traffic. For example, if the payload contains certain keywords or phrases that are commonly associated with a specific type of traffic, such as "HTTP" or "FTP," the traffic can be classified as web or file transfer traffic [11]. Similarly, if the payload contains certain patterns or characteristics that are commonly associated with a specific type of traffic, such as encryption or compression, the traffic can be classified as secure or compressed traffic. Once the traffic has been classified, it can then be managed and controlled using a variety of techniques, such as filtering, blocking, or prioritising [12]. Network administrators can regulate the traffic flow on their networks and guarantee that essential applications and services receive the necessary resources to operate smoothly. Payload-based traffic classification is an effective technique for monitoring and managing network traffic, especially in identifying and preventing harmful traffic like phishing, malware, and spam. However, it requires more advanced and sophisticated tools and resources and is more difficult to implement and maintain than port-based traffic classification.
User-behavior based traffic classification
User-behaviour-based traffic classification is a technique that uses information about user behaviour to classify network traffic, as shown in Algorithm. This approach focuses on analysing the behaviour of individual users and their interactions with the network rather than just looking at the technical characteristics of the traffic itself. The application of machine learning techniques [13] is one of the most important components of user-behaviour-based traffic classification. These models are trained on massive amounts of information acquired from the network, including information about user behaviour, network usage patterns, and other relevant metrics. The methods are developed to identify trends and patterns in the data that can be utilised to classify traffic into various groups. One of the advantages of user-behaviour-based traffic classification is that it is able to identify and classify new types of traffic that may not have been seen before. This is important in today’s rapidly changing digital environment, where new types of traffic and applications are constantly emerging. Additionally, this approach is less dependent on the technical characteristics of the traffic, which can change over time or be modified by attackers. Another advantage of user-behaviour-based traffic classification is that it can be used to detect and prevent malicious activity on the network [14]. For example, by analysing the behaviour of individual users, it is possible to identify unusual or suspicious activity that may indicate a cyberattack. This can include things like repeated login attempts, unusual traffic patterns, or the use of known malicious IP addresses. In conclusion, user-behaviour-based traffic classification is a powerful technique that can be utilised to analyse and classify different kinds of network traffic based on user behaviour. By using machine learning algorithms and analysing large amounts of data, this approach is able to identify new types of traffic and detect malicious activity on the network [15]. This can help organisations better understand and manage their network traffic and improve the security and performance of their networks.
Statistical flow feature-based approach
The statistical flow feature-based method is a classification method for network traffic that analyses the statistical properties of network traffic flow to categorise it into various groups [16] as shown in Algorithm 4. This approach typically involves calculating various statistical features of the traffic flow, such as the mean, variance, skewness, kurtosis, and entropy, and using these features to differentiate between different types of traffic. One equation commonly used in the statistical flow feature-based approach is the Shannon entropy equation, which calculates the entropy of a traffic flow based on the probability of each packet in the flow:
Where H is the Shannon entropy, p (i) is the probability of packet i, and ∑ is the summation over all packets in the flow. Using this formula, we can determine how random or unpredictable the traffic flow is, which helps us classify it into various categories. For example, a traffic flow with high entropy may be classified as random or unstructured, while a traffic flow with low entropy may be classified as structured or predictable.When it comes to accurately classifying various forms of network traffic, the statistical flow feature-based approach is a useful method because it considers the statistical properties of the traffic flow and can differentiate between different forms of traffic based on these characteristics. [17, 18].
Unsupervised machine learning is a technique for programming a computer to recognise trends and classify data without labelled training material, as shown in Algorithm 3. Unsupervised machine learning is applied in network traffic classification to recognise patterns in network traffic and categorise it into various types, including legitimate traffic, spam traffic, and harmful traffic [31] and [34]. One common unsupervised machine learning approach used in network traffic classification is clustering. Clustering is an approach to categorising similar data points by taking into account their features. In network traffic classification, clustering algorithms are used to group similar flows of network traffic together based on their features, such as IP address, port number, and packet size [35]. The equation for the ascent of network traffic classification in the Dark Net using an unsupervised machine learning approach can be represented as:
Supervised machine learning approaches can be used to address this challenge by utilising a labelled dataset to train the algorithm to recognise different types of network traffic [32] and [36], as shown in Algorithm 4.The equation for this supervised machine learning approach can be represented as follows:
Semi-supervised machine learning is a useful approach for network traffic classification, especially when labelled data is limited. In this approach, the classifier is trained on both labelled and unlabeled data to improve the accuracy of the classification, as shown in Algorithm 5.
The following equation describes the semi-supervised machine learning technique utilised in the development of network traffic identification in the Darknet.
As the darknet is a part of the internet that is not accessible through traditional search engines, it is challenging to identify the network traffic in this region. However, the use of hybrid machine learning has made it possible to effectively classify network traffic in the dark net. The accuracy of network traffic classification can be improved by using a hybrid machine learning approach that combines several machine learning methods. This method uses D-Tree, ANN, and SVM to categorise traffic on the dark web [38]. The following is a representation of the hybrid machine learning approach used to classify dark web network traffic.
The hybrid machine learning approach can improve the accuracy of network traffic classification on the dark web by incorporating the outcomes of different machine learning techniques. This can be done by combining the outputs from multiple machine learning algorithms. Network traffic classification could be greatly aided by the implementation of hybrid machine learning, which would also aid in the improvement of online privacy and security.The ascent of network traffic classification in the dark net can be modelled using a hybrid machine learning approach with the following equation:
The use of deep learning approaches in network traffic classification has gained popularity in recent years, including for classifying traffic in the dark net [39]. This is due to the ability of deep learning algorithms to automatically learn complex and abstract features from large amounts of data, leading to improved accuracy in classification compared to traditional methods, as shown in Algorithm 6. One common equation used in deep learning is the feedforward neural network (FNN) equation:
Experimental assessment measurements and analysis in network traffic classification involve the collection and analysis of actual network traffic data to evaluate and enhance the effectiveness of traffic classification algorithms. Table 2 shows the datasets for traffic identification.
Datasets, Methodology, Evaluation Metrics, and Work for Traffic Identification
Datasets, Methodology, Evaluation Metrics, and Work for Traffic Identification
The collection of data sets is a crucial stage in the classification of network traffic. The quality and diversity of the data used to train and test the classifier will directly impact its accuracy and effectiveness. There are several ways to collect data for network traffic classification, including:
Capturing live traffic This involves collecting real-time traffic data as it flows through the network. This can be done using tools such as network taps, network probes, or packet capture software.Using public datasets: There are several public datasets available that contain network traffic data, such as the ISCX dataset, the CAIDA dataset, and the UNB Anomaly Detection dataset. These datasets can be used for research and experimentation.
Using synthetic data In some cases, it may be necessary to generate synthetic data that simulates different types of raffic. This can be useful for testing and evaluating the performance of the classifier.
Labelling the data Once the data has been collected, it must be labelled with the correct traffic type. This can be done manually or through the use of automated labelling tools.It is important to ensure that the data used for network traffic classification is diverse and representative of the types of traffic that will be encountered in the real world. This will help to ensure that the classifier is able to accurately classify a wide range of traffic types. There are several sources where you can collect datasets for network traffic classification, as shown in Table 2.
ISCX (Internet Traffic Classification Dataset)
The ISCX dataset is a comprehensive collection of network traffic data obtained from various sources, such as university, corporate, and residential networks, designed for research and experimentation in the field of traffic classification [40]. It comprises more than 400,000 traffic flows, including packet size, protocol, and destination port, along with ground truth labels indicating the type of traffic. This dataset is valuable because it contains a wide range of traffic types, including normal and malicious traffic, making it an excellent resource for studying both legitimate and malicious activity on networks. Overall, the ISCX dataset is a valuable tool for researchers and practitioners in the field of network security and traffic classification.
The UNB Anomaly Detection Dataset
The UNB Anomaly Detection Dataset is a popular resource for research in network traffic classification and anomaly detection [41]. It features 49 types of traffic, including normal and anomalous traffic, that has been pre-processed and cleaned. The dataset includes a test set of 31,000 network flows and a training set of 25,000 and contains various types of attacks such as DoS, DDoS, and worm attacks. The dataset includes several variables for categorising network traffic, such as IP addresses, ports, protocol, packet length, and count, and includes a label for normal or anomalous traffic. The dataset is widely used in research to evaluate the effectiveness of network traffic identification and anomaly detection algorithms.
The CAIDA (Cooperative Association for Internet Data Analysis) Dataset
The CAIDA dataset is a vast collection of network traffic data from different sources, making it useful for various classification tasks [42]. It includes normal and malicious traffic and different types of attacks, making it ideal for training and evaluating network classification algorithms. The dataset features various elements for network traffic classification, including IP addresses, protocols, packet lengths, and counts. It also includes a label that distinguishes between normal and anomalous traffic. Researchers can use this dataset for different classification techniques, including supervised and unsupervised methods. The CAIDA dataset is a benchmark for network classification algorithms due to its accuracy, large size, and usage in several studies and publications. It is suitable for research on network classification and intrusion detection systems.
The CTU-13 dataset
The CTU-13 dataset is a widely used dataset for research on network traffic classification and anomaly detection [43]. It contains a variety of features, including normal and anomalous traffic, such as various types of attacks. It includes source and destination IP addresses, source and destination ports, protocol, packet length, number of packets, and a label indicating whether the traffic is normal or anomalous. The preprocessed data is divided into two sets for training and testing, containing around 80,000 and 40,000 network flows, respectively. It is a realistic dataset captured in a real-world environment and considered a benchmark for network traffic identification and anomaly detection algorithms.
The DARPA IDS dataset
The DARPA IDS dataset is commonly used for internet traffic classification and anomaly detection research. It contains both normal and malicious traffic, including DoS, DDoS, and worm attacks [44]. The dataset has various features for network traffic classification and is divided into a training set and a test set. It has been widely used in research on network traffic classification and intrusion detection and is considered a benchmark dataset. There are also many other datasets available for network traffic classification tasks, depending on specific needs and goals [45].
Network traffic feature set
The feature set for network traffic classification includes various parameters and characteristics that can be used to differentiate various types of traffic, as shown in Table 3. These features can be used to identify patterns and trends in network traffic and classify the traffic into different categories, such as web traffic, email traffic, peer-to-peer traffic, and so on. The specific feature set used for traffic classification will depend on the requirements of the network or organisation that is performing the classification. Some network traffic classification methodologies, such as machine learning-based ones, use multiple features to classify the traffic, and the accuracy of the classification can be improved by using multiple features together [46, 47]. The classification of network traffic is a critical and essential task for managing and securing networks. By identifying and categorising different types of traffic, network administrators can prioritise certain types of traffic, enforce network policies, and detect and prevent malicious or unauthorised activities.The feature set used for network traffic classification will depend on the specific requirements of the network or organisation. Some examples of feature sets for different purposes are: A company that wants to prioritize web traffic and email traffic over other types of traffic may use the protocol and port number as the primary features for classification. Web traffic typically uses the HTTP protocol on port 80, while email traffic uses the SMTP protocol on port 25. A network administrator that wants to monitor and prevent peer-to-peer file sharing may use the packet payload as a primary feature for classification. Peer-to-peer file sharing applications often use specific patterns or keywords in the packet payload that can be used to identify the traffic. A network administrator that wants to detect and prevent DDoS attacks or other malicious activities may use flow rate, retransmissions, out-of-order packets, and error packets as primary features for classification. DDoS attacks often generate a large number of packets with a high flow rate, retransmissions, out-of-order packets, and error packets.
Key Features of Network Traffic Identification
Key Features of Network Traffic Identification
In general, the feature set used for network traffic classification will depend on the specific requirements of the network or organisation and the available tools and resources for monitoring and analysing network traffic. It is important for network administrators to continuously monitor and update their classification methods in order to adapt to new types of traffic and evolving network threats [48, 49].
There are several evaluation metrics, including accuracy, precision, recall, and F1 score, that can be employed to assess the performance of a network traffic classification system. The accuracy rate is calculated as the number of correctly labeled packets divided by the total number of packets in the dataset. This formula determines the accuracy rate, which is a key measure for evaluating the efficiency of a network traffic classification system. It represents the percentage of accurately classified packets and provides a useful indication of the system’s effectiveness. In order to get a more comprehensive view of the network traffic classification system’s performance, it is often suggested to use additional metrics in addition to the accuracy rate [50, 51]. In order to determine the accuracy?rate, the following formula can be used:
The ease of execution can be determined by comparing the number of classification processes to the total number of potential processes. The simplicity of execution metric is used to evaluate how efficient a classification system is in terms of computational complexity. The fewer steps required for classification, the more efficient the system is considered to be. However, this metric may not take into account other factors that can affect the overall efficiency of the system, such as the complexity of the algorithms used, the hardware resources available, and the size and complexity of the dataset. Therefore, it is recommended to use other metrics in conjunction with the simplicity of execution metric to obtain a more complete evaluation of the system’s efficiency [52].
The term "real-time implementation" pertains to the capacity of a classification system to provide results within a certain timeframe, typically in real-time. The time required for classification relative to the total time available for classification can be used as a metric to evaluate the real-time implementation of a classification system. This metric is useful for evaluating how well a classification system can perform in a real-time environment where timely and accurate results are critical [53]. The formula is
Encrypted traffic classification is the process of identifying the type of traffic that is encrypted. The metric you provided is a way to measure the effectiveness of the encrypted traffic classification process. This metric is useful for evaluating how well the traffic classification system can correctly identify encrypted traffic. The higher the Encrypted Traffic Classification value, the better the system is considered to be at classifying encrypted traffic [54]. The formula is:
Time complexity is another important aspect of network traffic classification in the dark net. It refers to the amount of time required for the classification process to complete and is dependent on the time required for classification and the total time available for classification. The time required for classification depends on several factors, such as the complexity of the classification algorithm, the number of features used for classification, the size of the dataset, and the available computational resources. On the other hand, if the algorithm is complex and the dataset is large, the time required would be much higher. The total time available for classification is determined by the real-time nature of the traffic flow and the available computational resources. In the dark net, traffic can be highly variable, and the classification process needs to keep up with the real-time data flow [55]. Therefore, the total time available for classification is dependent on the network bandwidth, processing power, and memory available for the classification process.
Table 4 presents a comparison of different techniques for traffic identification in terms of several key performance metrics. The techniques are categorised into five groups based on the underlying methodology used: port-based, payload-based, machine learning-based, hybrid machine learning-based, and deep learning-based. The metrics used for comparison include accuracy rate, simplicity of execution, real-time implementation, unknown traffic identification, encrypted traffic classification, and time and space complexity. The accuracy rate of each technique is presented in the first column of the table. Port-based identification achieves a modest accuracy rate of 60%, while payload-based techniques perform slightly better with a reasonable accuracy rate of 75%. Machine learning-based techniques achieve a strong accuracy rate of 85%, and hybrid machine learning-based techniques perform even better with a very strong accuracy rate of 95%. Deep learning-based techniques achieve the highest accuracy rate, with an outstanding rate of 97%. The second metric, simplicity of execution, is presented in the second column. Port-based identification requires the least effort, with an effortless execution that involves only 1-2 steps. Payload-based techniques require slightly more effort, with a manageable execution that involves 3-4 steps. Machine learning-based techniques require an involved execution with 5-6 steps, while hybrid machine learning-based techniques require a complex execution with 7-8 steps. Deep learning-based techniques require the most effort, with a very complex execution that involves 9 or more steps. The third metric, real-time implementation, is presented in the third column. Port-based identification is the fastest technique, with a swift implementation that takes less than 1 second. Payload-based techniques are slower, with an implementation that takes 1–5 seconds. Machine learning-based techniques have a steady implementation that takes 5–10 seconds, while hybrid machine learning-based techniques have a longer implementation time that takes 10–30 seconds. Deep learning-based techniques have the longest implementation times, taking more than 30 seconds. The fourth and fifth metrics, unknown traffic identification and encrypted traffic classification, are presented in the fourth and fifth columns, respectively. Port-based and payload-based techniques have limited identification capabilities for unknown traffic and encrypted traffic classification, achieving only 30-50% accuracy rates. Machine learning-based techniques have adequate identification capabilities, achieving 50-70% accuracy rates. Hybrid machine learning-based techniques have proficient identification capabilities, achieving 70-85% accuracy rates. Deep learning-based techniques have exceptional identification capabilities, achieving accuracy rates of 85-95% for unknown traffic and encrypted traffic classification. Finally, the time and space complexity of each technique is presented in the last column. Port-based identification has the simplest time and space complexity, with a constant time complexity of O(1). Payload-based techniques have manageable time and space complexity, with a logarithmic time complexity of O(log n). Machine learning-based techniques have a complex time and space complexity, with a linear time complexity of O(n). Hybrid machine learning-based techniques have a very complex time and space complexity, with a quadratic time complexity of O(n2). Deep learning-based techniques have an extremely complex time and space complexity, with an exponential time complexity of O(2
n
).
Comparison of Different Techniques for Traffic Identification
Comparison of Different Techniques for Traffic Identification
Overall, the table shows that deep learning-based techniques outperform all other techniques in terms of accuracy rate, while port-based techniques are the simplest and fastest to implement. The choice of technique depends on the specific requirements of the traffic identification task, such as accuracy, execution time, and identification capabilities for unknown and encrypted traffic.
The comparative evaluation of various machine and deep learning techniques for Darknet traffic classification revealed several limitations based on the performance analysis shown in Figure 2.

Performance metrics utilized in the analysis of network traffic.
The accuracy rates of the compared approaches varied considerably, ranging from 60% to 97%. This indicates that even the most accurate techniques are still subject to classification errors, which can have significant consequences for security and law enforcement. Additionally, some approaches, such as the Decision Tree and Random Forest, had relatively low accuracy rates, which suggests that they may not be suitable for practical use in real-world scenarios.
Another limitation of the compared approaches is related to their simplicity of execution, which ranged from 1 to 9 steps. While simpler techniques are generally preferred due to their ease of implementation and reduced computational complexity, they may not be as accurate as more complex approaches. Thus, the trade-off between simplicity and accuracy must be carefully considered when selecting a Darknet traffic classification technique.
Real-time implementation is another crucial factor to consider when evaluating the effectiveness of traffic classification techniques. The compared approaches showed considerable variation in this regard, with implementation times ranging from less than 1 second to over 60 seconds. This indicates that some techniques may not be suitable for use in real-time applications that require quick decision-making.
Another limitation of the compared approaches is their ability to identify unknown traffic, which ranged from 30% to 95%. This suggests that some approaches may struggle to classify previously unseen traffic, which is a common occurrence in the dynamic and ever-changing Darknet environment.
Finally, the time and space complexity of the compared approaches varied considerably, with some techniques having a complexity of O(1) and others having a complexity of O(2 n ). This indicates that some techniques may require more computational resources than others, which can impact their practical usability.
In conclusion, the comparative evaluation of Darknet traffic classification techniques revealed several statistical limitations related to the accuracy, simplicity of execution, real-time implementation, unknown traffic identification, and time and space complexity. These limitations must be carefully considered when selecting an appropriate technique for real-world applications.
Network traffic classification in the dark net is a challenging task due to the anonymous nature of the communication and the use of encryption technologies. The following are some recent open research challenges for the ascent of network traffic classification in the dark net:
Encrypted traffic classification Encrypted traffic poses a significant challenge for network traffic classification in the dark net. Developing accurate classification methods that can identify the type of encryption used and the purpose of the communication is a critical research challenge.
Multi-protocol classification The use of multiple protocols in the dark net makes it difficult to classify network traffic. Developing techniques that can handle multiple protocols simultaneously is a significant research challenge.
Dynamic traffic classification The traffic patterns in the dark net are continuously changing, making it challenging to develop a static classification method. Developing techniques that can adapt to the changing traffic patterns is an open research challenge.
Anomaly detection Dark net traffic often involves malicious activity, making it important to develop methods that can detect anomalous traffic patterns. Developing effective anomaly detection techniques is a critical research challenge.
Large-scale traffic classificationThe amount of traffic in the dark net is enormous, making it challenging to handle and classify. Developing techniques that can handle large-scale traffic classification is an open research challenge.
Adversarial attacks Attackers can try to evade traffic classification methods by manipulating the traffic patterns. Developing methods that can detect and mitigate adversarial attacks is an open research challenge.
Conclusion
This research examines the present state of network traffic classification on the darknet. The research reveals that the task of classifying network traffic in the dark net is particularly difficult because of the significant degree of anonymity and encryption present in the network. The survey examined various approaches to classify traffic in the dark net, including feature-based, port-based, payload-based, machine learning-based, hybrid machine learning-based, and deep learning-based methods. The accuracy rate of these methods varies from low to very high, with machine learning-based and deep learning-based methods achieving the highest accuracy rates. However, these methods have a low to very high complexity in terms of time and space requirements, with feature-based methods being the simplest to execute and deep learning-based methods being the most complex. The real-time implementation of these methods also varies from high to low, with feature-based methods being the easiest to implement in real-time, while machine learning-based and deep learning-based methods have lower real-time implementation capabilities. The ability to identify unknown traffic and classify encrypted traffic also varies among the different methods, with machine learning-based, hybrid machine learning-based, and deep learning-based methods performing better than feature-based and port-based methods. Overall, the study highlights the need for further research to develop more robust and accurate methods for network traffic classification in the dark net. The development of such methods is crucial for identifying malicious activity and improving the security of the dark net.
