Abstract
With the rapid increase in the robustness and impact of cyber-attacks, a counter-evolution in defense efforts is essential to ensure a safer cyberspace. A critical aspect of cyber defense is the experts’ ability to understand, analyze, and share knowledge of attacks and vulnerabilities in a timely and intelligible manner that facilitates the detection and mitigation of emerging threats. Cyber threat intelligence (CTI) reports, and Common Vulnerabilities and Exposures (CVEs) are two primary sources of information that security analysts use to defend against cyber attacks. Analyzing the tactics, techniques, and procedures (TTPs) of attackers from these sources by mapping them to the ATT&CK framework provides valuable insights to defenders and aids them in countering various threats.
Unfortunately, due to the complexity of this mapping and the rapid growth of these frameworks, mapping CTI reports and CVEs to ATT&CK is a daunting and time-intensive undertaking. Multiple studies have proposed models that automatically achieve this mapping. However, due to their reliance on annotated datasets, these models exhibit limitations in quality and coverage. To overcome these challenges, we present SMET – a tool that automatically maps text to ATT&CK techniques based on textual similarity. SMET achieves this mapping by leveraging ATT&CK BERT, a model we trained using the SIAMESE network to learn semantic similarity among attack actions. In inference, SMET utilizes semantic extraction, ATT&CK BERT, and a logistic regression model to achieve ATT&CK mapping. As a result, SMET has demonstrated superior performance compared to other state-of-the-art models.
Keywords
Introduction
The number, robustness, and impact of cyber attacks have increased significantly in the past few years. The cost of cybercrime was estimated to be around $8.4 trillion globally in 2022, and it is predicted to reach $23.84 trillion annually by 2027 [15]. As organizations and governments strive to combat cybercrime, their efforts are being met with increasingly complex attacks and hindered by the prevalence of software vulnerabilities. A notable example of such an attack is WannaCry, a malware that leveraged a vulnerability in the Windows system to spread globally and cause an estimated financial loss of $4 billion [11].

ATT&CK mapping importance.
The seamless sharing of threat knowledge is critical for security analysts’ timely and effective response. Two of the most valuable threat knowledge-sharing sources are cyber threat intelligence (CTI) reports and Common Vulnerabilities and Exposures (CVEs). CTI reports provide detailed information about cybersecurity threats, including methods and strategies employed by threat actors during various stages of a cyber attack and actionable insights that help organizations understand emerging threats and implement effective countermeasures. CVE is a list of publicly disclosed vulnerabilities introduced by MITRE to identify vulnerabilities in public software. Each entry of CVE consists of a unique ID number and a description of a discovered vulnerability alongside other critical information. CVE is leveraged by organizations to monitor newly discovered vulnerabilities and ensure the security of their systems and networks against them.
Multiple comprehensive knowledge bases and ontologies have been constructed by various organizations to aid security analysts in understanding and categorizing information regarding attackers’ behaviors and objectives. ATT&CK is a knowledge base developed by MITRE that categorizes attack techniques that adversaries have been observed using in the real world. ATT&CK categorizes attack techniques into different tactics, where each tactic represents a goal that the attacker tries to achieve, such as initial access, execution, or defense evasion. A tactic contains multiple techniques that an attacker can achieve; each includes a textual description of the attacker’s behaviors and examples of real-world uses of the technique by known malicious groups and software. Moreover, each technique contains detection and mitigation recommendations to aid security teams in countering the technique. ATT&CK’s high-level attack categorization and practical defense recommendation made it a valuable resource for security analysts to secure their systems and networks.
Mapping CTI reports and CVE to the ATT&CK framework provides a standardized and structured way to categorize and understand the tactics and techniques employed by threat actors. This mapping enriches CTI reports and CVEs by facilitating a contextual understanding of threats and vulnerabilities. It allows defenders to identify patterns and trends across various incidents and prioritize their efforts based on the most relevant and impactful threats. Moreover, ATT&CK mapping gives security analysts better insights into existing detection and mitigation measures, helping them set their defense measures accordingly, as shown in Fig. 1.
However, ATT&CK mapping is a difficult task that requires the efforts of experts with a thorough understanding of threat intelligence and the ATT&CK framework. Moreover, as the number of CTI reports and CVEs grows daily, manually mapping them to techniques becomes an infeasible task. Previous studies have tackled this problem and introduced machine learning models that automatically map CTI reports and CVEs to ATT&CK [6,8,17,19,22–25]. In these studies, researchers utilized an annotated dataset of ATT&CK mappings and trained a text classification model (e.g., deep learning models) to classify CTI reports and CVEs to a limited number of techniques using their textual description. However, the performance and coverage of these models are constrained by the annotated dataset that they rely on and cannot adapt to the dynamic nature of these frameworks. Moreover, most of these approaches lack explainability due to the black-box nature of their proposed deep-learning models.
This paper introduces an extension of SMET for mapping CVE to ATT&CK, which was presented in a previous conference paper [3]. Unlike previous studies, SMET achieves ATT&CK mapping without relying on any annotated dataset. SMET maps text to techniques by first leveraging a semantic role labeling (SRL) model to extract attack vectors from an input text. Attack vectors are textual descriptions of malicious actions that an attacker can perform. Some examples of attack vectors include “exploit a vulnerability to gain access to a network,” “execute code on a victim machine,” or “send data to a C2 server.” SMET then extracts the embedding of these attack vectors using our developed ATT&CK BERT model. ATT&CK BERT is a transformer model that we fine-tuned using the SIAMESE network over ATT&CK Matrix data. As a result, ATT&CK BERT extracts semantically meaningful embeddings of attack vectors. Thus, extracted embeddings of similar attack vectors are close in the embedding space. Finally, SMET uses a logistic regression model that we trained using the ATT&CK Matrix to estimate the probability of an attack vector belonging to each ATT&CK technique and rank techniques based on the estimated probability. SMET1 is publicly available on GitHub.
In this paper, we extend our previous work from various aspects. First, we extend SMET capabilities to map cyber threat intelligence (CTI) reports to ATT&CK. We achieve this mapping by adapting the attack vector extraction component to enable the extraction of attack vectors from CTI reports and leverage the ranking aggregation process to overcome noise in the input text. Second, we enhance the SMET training process by introducing a loss calculation formula that assigns weights to samples from classes based on their frequency in the training dataset, aiming to mitigate the data imbalance problem. Third, we further evaluate SMET on a public dataset introduced by MITRE that contains 4,816 CTI sentences mapped to ATT&CK. Fourth, we study the confidence scores generated by SMET and their correlation to its precision. Finally, we perform a deeper analysis of SMET by reporting the confusion matrix of its predictions and identifying techniques that SMET mislabels.
This paper is organized as follows. Section 2 presents the motivation and the problem statement. Section 3 presents previous related work. Section 4 introduces SMET. Section 5 evaluates SMET. Section 6 contains the conclusion and future work.
Motivation
Mapping CTI reports to ATT&CK is valuable to security analysts in various ways. First, it provides additional context to the CTI report by categorizing attack actions to already documented, studied, and analyzed techniques in ATT&CK. This additional context accelerates threat response and helps defenders deploy appropriate, well-studied countermeasures. Second, adopting ATT&CK as a common language for characterizing TTPs in CTI reports facilitates collaboration and information sharing among organizations, where they can share insights, compare findings, and collectively defend against emerging threats. Highlighting the significance of CTI report to ATT&CK mapping, the Cybersecurity and Infrastructure Security Agency (CISA) introduced Decider – a tool that helps analysts manually map adversary behavior to the ATT&CK Framework [2]. Moreover, CISA’s reports frequently contain references to ATT&CK within their textual descriptions, as shown in the technical description of the report with the alert code AA20-296A in Fig. 2.
Similarly, mapping CVEs to ATT&CK is valuable to security analysts from additional aspects. Whenever a CVE entry for publicly used software is published, security analysts in organizations must be alerted and take appropriate measures to ensure the security of their system against the newly discovered vulnerability. Although the best defense against a vulnerability is applying its patch, patches take time to implement, during which, the software remains vulnerable, threatening organizations’ systems and networks. Moreover, some vulnerabilities are never patched due to cost and complexity. Therefore, security analysts must take other measures to counter the vulnerability. Here is when ATT&CK proves helpful. When a vulnerability is linked to an ATT&CK technique, it can be studied from an attacker’s perspective, as ATT&CK sheds insight into how an attacker can exploit the vulnerability, what an attacker can gain from the exploitation, and what mitigation and detection measures can be taken to counter the attacks. An example of mitigation measures recommended by ATT&CK to mitigate the exploitation for privilege escalation technique is shown in Fig. 3.

CISA alert code AA20-296A technical details.

The exploitation for privilege escalation technique mitigation from ATT&CK.

Text similarity between the user execution technique description and CV-2020-4553 description.
Previous studies have proposed models that automatically map text to ATT&CK. In these studies, experts annotate a dataset by manually mapping CTI report sentences or CVE entries to ATT&CK techniques. They then train a supervised machine learning model to link text to techniques using the annotated data. The annotation process is both time-consuming and labor-intensive, leading to datasets with limited quality and coverage. Consequently, models trained on such datasets suffer from performance limitations and cannot accommodate the evolving nature of ATT&CK. In contrast, this paper tackles the problem of mapping a text to an ATT&CK technique in an unsupervised manner by leveraging the textual description ATT&CK techniques.
Our proposed approach achieves unsupervised mapping by leveraging text similarities between an input text and the descriptions of ATT&CK techniques. An input text can be a CTI report, a CVE entry, or any text that describes an attack. An example text similarity can be seen in Fig. 4, where a CVE-2020-4553 description states that an attacker can exploit it by persuading the victim to open a crafted file. This attack behavior is represented in the user execution technique from ATT&CK. Identifying sentences’ semantic similarities is a challenging and active area of research. Although researchers have introduced several deep learning models to extract sentence similarity, no research has investigated sentence similarity in cybersecurity. The complexity of text similarity in cybersecurity stems from the lack of annotated data and the semantic gap between low-level and high-level attack vectors. For example, the sentence “An attacker runs a script in a machine” is semantically similar to “A malicious actor executes code on a system,” as they represent a similar objective. Although, the two sentences have no words in common. On the other hand, the two sentences “An attacker reads a file” and “An attacker deletes a file” share almost all words but are semantically different, as each attack vector corresponds to different attack objectives. Table 1 shows examples of semantically similar attack vectors.
Semantically similar attack vectors
Semantically similar attack vectors
The description of an ATT&CK technique contains several attack vectors that the attacker performs to achieve the technique. These attack vectors can range from low-level behaviors, such as “create file” or “read registry,” to high-level behaviors, such as “compromise system” or “steal information.” Each ATT&CK technique consists of mainly two attack vectors: an action that an attacker takes and the objective of that action. For example, in the exploitation for client execution technique, an attacker exploits software vulnerability in client applications (action) to execute code (objective). Actions are usually lower-level attack vectors, while objectives are higher-level attack goals and can represent the tactic of the attack. The action and objective can be identified for all techniques from the first sentence in the technique description.
ATT&CK techniques cover all stages of an attack life cycle, from reconnaissance and initial access to the compromising of confidentiality, integrity, and availability. Therefore, techniques can be linked to a CVE entry through various means. We investigated the CVE-ATT&CK association and identified two types of techniques for classifying a CVE entry that are inspired by [9]. First, a CVE entry can be mapped to techniques that describe an exploitation method, more specifically, a technique that an attacker needs to perform to exploit a vulnerability, such as exploiting a web browser vulnerability or tricking a user into performing an action. Second, CVEs can be mapped to techniques that describe the consequences of exploiting a vulnerability or the objectives that an attacker can accomplish after exploiting the vulnerability, such as code execution, privilege escalation, or data manipulation. In other words, one technique enables an attacker to exploit a vulnerability, and a vulnerability enables an attacker to achieve other techniques. We refer to these techniques as pre-exploit and post-exploit techniques, respectively.
We studied various ATT&CK techniques and identified techniques that correspond to each mapping type. Pre-exploit techniques are mostly part of the initial access tactic, such as drive-by compromise, exploit public-facing applications, user execution, and valid accounts. In the drive-by compromise technique, an attacker uses websites to exploit a vulnerability in users’ web browsers. In the exploit public-facing application technique, an attacker exploits a website or any public-facing application, such as databases or standard services that use crafted input. In the user execution technique, the attacker depends on an action by the user to exploit a vulnerability. Finally, in the valid accounts technique, an attacker needs to compromise an account first in order to exploit the vulnerability.
Post-exploit techniques can be part of any tactic. The execution, privilege escalation, and credential access tactics all involve a technique of an adversary exploiting a vulnerability to achieve the tactic. The techniques are: exploitation for client execution, exploitation for privilege escalation, and privilege escalation for credential access, respectively. In addition, the lateral movement tactic contains a technique – exploitation of remote services – where an attacker exploits a vulnerability in remote services to gain access to systems. Moreover, the impact tactic contains various techniques where an attacker aims to compromise the system or data’s availability or integrity. One example is the application or system exploitation sub-technique, which describes the attack technique where an attacker exploits a vulnerability to compromise availability. CVE can also be mapped to other techniques that are not limited to vulnerability exploitation but describe the attacker’s goals, such as data manipulation or system shutdown/reboot.In conclusion, the ATT&CK framework contains various techniques related to CVE. Some techniques focus specifically on vulnerability exploitation, while other general techniques can be achieved after exploitation.
Related work
Since the ATT&CK Matrix became a vital resource for security analysts to understand and counter cyber threats, researchers have proposed multiple algorithms to map various cyber information resources to ATT&CK techniques automatically. For example, multiple studies have proposed models to map CTI reports to ATT&CK techniques [8,19,23,24]. Other studies have proposed models to map CVE entries to ATT&CK techniques [6,17,22,25]. Moreover, researchers have proposed models to map malware behavior, Linux shell commands, and threat data from smart grid systems to ATT&CK [7,20,28]. Other studies have focused on enriching CVE by automatically mapping it to the Common Weaknesses Enumeration (CWE) or Common Attack Pattern Enumeration and Classification (CAPEC) frameworks [5,12]. Researchers have also proposed domain-specific language models for cybersecurity that can be applied to a wide range of downstream tasks [4,21].
In one study, researchers studied the performance of various traditional machine learning models (e.g., Naive Bayes and SVC) and advanced deep learning models (e.g., CNN, BERT, SciBERT, and SecBERT) in linking CVE entries to the ATT&CK matrix using supervised learning [17]. They also studied the impact of data augmentation methods that help enrich the training set. They used the dataset introduced by MITRE Engenuity [14] for training and testing and enriched it with their own labeled data. In their evaluation, they only considered mapping to 31 ATT&CK techniques.
In another study, researchers proposed a neural network architecture – Multi-Head Joint Embedding Neural Network – that automatically maps CVE to ATT&CK techniques [22]. To create a training dataset, they introduced an unsupervised labeling technique with which they extracted CVE information from publicly available threat reports. Their labeling technique was able to map CVE entries to 17 ATT&CK techniques. Moreover, researchers have proposed CVET, a transformer-based model that has mapped CVE entries to 10 ATT&CK tactics [6]. CVET was trained using the self-knowledge distillation approach over the BRON dataset [18].
The aforementioned proposed models map CVE to a limited number of ATT&CK techniques due to their reliance on a manually annotated dataset. However, in a CTI mapping research, researchers introduced a tool – AttacKG – that maps CTI reports to all ATT&CK techniques [24]. AttacKG converts CTI reports and ATT&CK techniques into structured attack behavior graphs and maps reports by aligning their graphs to the techniques’ template graphs. In the alignment phase, AttacKG uses character-level similarity to align nodes instead of semantic similarity. Unfortunately, character-level similarity is limited and cannot align text with similar meanings but different wording.

SMET overview.
SMET overview
SMET consists of three components: an attack vector extraction component, an attack vector representation model (ATT&CK BERT), and an ATT&CK mapping model (logistic regression). First, the attack vector extraction component uses a semantic role labeling (SRL) model to extract attack vectors from an input text. Second, SMET utilizes an attack vector embedding model – ATT&CK BERT – that extracts a semantic vector representation of attack vectors. Using ATT&CK BERT extraction, the extracted embeddings of similar attack vectors are close in the embedding space. Thus, semantically similar attack vectors have a high cosine similarity, while unrelated attack vectors have a low cosine similarity. Finally, SMET uses a logistic regression model to map attack vectors to ATT&CK techniques and ranks all techniques by their corresponding logistic regression confidence score. SMET finally aggregates these rankings to form one overall ranking. Figure 5 shows an overview of the SMET architecture.
SMET can take any textual data as input, spanning from one short sentence to multiple paragraphs. When the input is short (e.g., one sentence or attack action), the attack vector extraction component can be bypassed, and the entire input can be treated as an attack vector. For inputs that consist of a few lines, such as a CVE entry of a paragraph from a CTI report, SMET extract attack vectors, map each vector to techniques, and aggregate all mappings into one ranking list. In cases where the input is long, like a full CTI report, the text must be segmented into multiple paragraphs or sentences and processed separately.
Attack vector extraction
CTI reports and CVE entries contain various information; some can be relevant to ATT&CK mapping, while others are less critical. For example, a CVE entry textual description contains attack-related information, such as actions that an attacker can take to exploit the vulnerability and objectives that an attacker can gain from the exploitation. On the other hand, it includes irrelevant information to ATT&CK mapping, such as software name, affected versions, and vulnerability type.
Unfortunately, CTI reports and CVE entries do not follow a predefined structure, making it challenging to filter out irrelevant information. Moreover, providing long, noisy text to SMET’s logistic regression model can degrade its mapping performance. Therefore, the attack vector extraction component aims to automatically split the text into smaller, concise, and comprehensive pieces of information denoted as semantic frames.
A semantic frame consists of a verb linked to its neighboring words/phrases tagged by their semantic relationship to the verb, such as a subject, patient, location, manner, or purpose. We achieve semantic frames extraction by leveraging an AllenNLP SRL model, a state-of-the-art semantic extraction model that extracts semantic frames from unstructured text [16]. Figure 6 shows an example of semantic frames extracted from the CVE-2021-27032 description. For simplicity, we combined all roles other than subject and verb and assigned them an “objects” tag in the table.

Attack vector extraction example.
Semantic frames can represent various types of information. For example, the first semantic frame in Fig. 6 represents an attribute of the software using the verb “is.” The second semantic frame represents a causal relationship between “buffer overflow” (subject) and “improper bound checking” (objects) using the verb “caused.” The last semantic frame represents an attack action identified by the subject “local attacker.”
Semantic similarity between sentences is a multifaceted concept that requires a clear definition based on the research goals at hand. In our study, we observed that attack vectors can exhibit similarities in various aspects. For example, “create a file” and “create a registry key” are similar in the sense that both represent adding a storage unit in the system, while “create a file” and “delete a file” are similar in the sense that both apply an action to a file. Other dimensions of similarity can encompass the location of attack vectors (e.g., system or network) or privileges needed (e.g., administrator or user). Since we aim to map attack vectors to ATT&CK techniques, we consider two attack vectors similar only if they share the same objective and help achieve the same technique. For example, the three attack vectors “delete system file,” “delete anti-virus file,” and “delete log files” are different, although they all represent deleting a file from the system. “Delete system file “aims to interrupt the availability of the system; “delete anti-virus file” aims to disable defensive mechanisms; and “delete log files” aims to cover attack actions.
To extract semantic embedding, we propose ATT&CK BERT. ATT&CK BERT is a transformer model that aims to represent attack vectors in a semantically meaningful embedding where the embedding of attack vectors with similar meanings are close in the embedding space and thus, have high cosine similarity. Text embedding has been an active research challenge for a long time. Since the evolution of the transformer architecture, researchers have introduced multiple transformer-based sentence embedding models. One of the best-performing recent models is Sentence BERT (SBERT) [27]. SBERT introduced a method to train the BERT model using SIAMESE network architecture by taking two sentences as input, extracting each sentence embedding using BERT, and then optimizing the network weights to maximize the similarity of the two embeddings if the sentences are semantically similar and minimize it otherwise.
Since the original SBERT is trained on a general entitlement dataset, its performance on cybersecurity text is limited. However, preparing a dataset of pairs of sentences that cover all attack life cycle information and annotating them is an infeasible task. To overcome this challenge, we propose an approach to fine-tune SBERT using the ATT&CK framework as follows. First, we extract all attack vectors from each ATT&CK technique description and procedure examples using the Allenlp SRL model. Second, we create two lists that we denote as positive and negative lists. The positive list contains pairs of attack vectors that are extracted from the same technique, while the negative list contains pairs of attack vectors where each is extracted from a different technique. Finally, we train SBERT to maximize the cosine similarity between pairs from the positive list and minimize the cosine similarity between pairs from the negative list. This approach is shown in Fig. 7.
Specifically, the dataset was prepared as follows. From each technique, we extracted all attack vectors and combined each two extracted attack vectors in a list of pairs. We randomly selected 40 pairs from the list if the list size was more than 40. All selected pairs from all techniques were combined in the positive list. For each technique, we paired its attack vectors with at most six attack vectors from each other technique. We randomly sampled 160 pairs from the resulting pairings and added them to the negative list. This resulted in 38,396 pairs – 7,356 positives and 31,040 negatives. We fine-tuned the pre-trained all-mpnet-base-v2 model using our dataset. We used the Adam optimizer with a 2e-05 learning rate and fine-tuned the model for one epoch.

ATT&CK BERT fine-tuning.
Leveraging ATT&CK BERT, we trained a logistic regression model as follows. First, we collected all attack vectors from each technique and labeled each attack vector by its corresponding technique. We then used ATT&CK BERT to extract all attack vectors’ embedding and trained a multinomial logistic regression model using this training set. This training approach is shown in Fig. 8. We used logistic regression instead of a more complex model to avoid overfitting due to our dataset’s high number of classes (185 techniques) and the small number of samples per class (52 median). Moreover, we used the class weights mechanism, where weights are adjusted inversely proportional to class frequencies during training to overcome the class imbalance.

Logistic regression training.

SMET ATT&CK mapping.
The class weights mechanism aims to tackle the class imbalance challenge during training by scaling the training loss of samples belonging to a technique inversely proportional to the number of samples in the training set that are classified under that technique. This loss scaling prevents the model’s bias towards classes with more samples over classes with limited samples. The most popular weighting criterion is balanced weighting, where the collective influence on the training loss of all samples from one class is equal across all classes. In highly imbalanced datasets, balance weighting can have an adverse effect as samples from low-data classes largely and disproportionately influence the model, thus decreasing its performance, as we examined in the Evaluation section. To overcome this challenge, we introduce a modified weight sampling criterion that SMET uses to limit the bias towards classes with more samples while maintaining the model’s performance. The balanced and our modified class weights formula are shown below. Y represents the set of existing classes, and
When mapping an input text to techniques, SMET runs the trained logistic regression model over each attack vector extracted from the text. Each run will return a ranking of all techniques based on the logistic regression confidence scores. SMET then aggregates all rankings by setting the score of each technique to the maximum score across all attack vectors. For example, if SMET extracts three attack vectors from an input, AV1, AV2, and AV3, it runs logistic regression thrice, one time over each attack vector. Suppose a technique T got 0.2, 0.3, and .01 scores for AV1, AV2, and AV3, respectively. SMET sets the score of T to 0.3. It does the same for all techniques and then ranks them based on their final scores.
As ATT&CK BERT extracts semantic embedding of attack vectors, SMET explains its mapping by identifying the most similar attack vectors from the mapped technique’s description or procedure examples. This explanation helps security analysts understand SMET’s decision and gain insight into its mapping criteria. Figure 9 shows an example of SMET’s mapping of CVE-2021-27032. The middle step presents a list of attack vectors extracted by SMET, and the last step shows SMET’s two highest-ranked techniques and the most similar attack vector from each technique to the CVE. Consequently, for techniques that include sub-techniques, SMET maps each attack vector to the specific sub-technique that contains the most similar attack vector.

Histogram of CVE’s dataset techniques.
Dataset
We used the following two datasets in evaluating SMET. The first dataset was introduced by TRAM [10], an open-source platform developed by MITRE to map CTI reports to ATT&CK. This dataset contains 4,816 sentences extracted from CTI reports. Each sentence is mapped to a single technique from 50 pre-defined techniques. The second dataset contains CVE entries that we manually mapped to ATT&CK. Each entry is mapped to one or more techniques. We collected 1,813 CVE entries published from 2014 to 2022 and gathered by existing research [9,17]. We randomly picked 303 entries for manual mapping. We divided the mapping into two tasks to ensure accurate and justifiable mapping. First, we identified attack vectors from the entry description that can be mapped to ATT&CK. Second, we identified the most similar attack vector from ATT&CK for each extracted attack vector and mapped the entry to the corresponding technique. As each attack vector in an entry can be mapped to different techniques, an entry can be mapped to multiple techniques. Figure 4 shows an example of mapping CVE-2020-4553 to the user execution technique.
We mapped the 303 CVE entries to 41 ATT&CK Matrix (version 12.0) techniques. Figure 10 shows the number of CVE entries mapped to each technique. For example, the exploitation for the client execution technique had the most entries mapped to it, as many CVE entries in our dataset allow an attacker to execute code. Moreover, the four techniques with the most entry mappings focus on vulnerability exploitation, whereas other techniques correspond to various stages of the cyber kill chain.
Metrics and baselines
When evaluating a ranking model, we aim to rank the ground truth labels as high as possible. We evaluated SMET ranking using four ranking metrics that are applicable for single and multi-label datasets: coverage error, label ranking loss, label ranking average precision (LRAP), and recall@k. LRAP calculates what percentage of the higher-ranked labels are true labels. LRAP values range from 0 to 1, excluding 0, and the best LRAP value is 1. Coverage error represents the average number of top-scored predictions required so that the predictions include all ground truth labels. The best coverage error value equals the average number of labels. Label ranking loss represents the average number of incorrectly ordered label pairs with respect to the number of correctly ordered labels. The best value of label ranking loss is 0. recall@K calculates the proportion of ground truth labels found in the top-k predictions.
Below are the formulas of the first three metrics as defined by the scikit-learn python library [1], where
We compared SMET performance to the following two unsupervised text similarity models: SBERT, and TF-IDF. SBERT is a sentence embedding model introduced by [27], where they used the SIAMESE network to train transformer models on natural language inference datasets. In our experiments, we used the pre-trained all-mpnet-base-v2 model, which was trained on over 1 billion training pairs and designed for general-purpose text similarity. The all-mpnet-base-v2 model achieves the highest average score on over 20 NLP datasets [26].
TF-IDF is a traditional information retrieval model that computes the similarity of documents based on a statistical analysis of word distribution across documents. TF-IDF’s robustness and simplicity make it the first go-to method in text similarity tasks. In each case, we extracted the embeddings of ATT&CK technique descriptions and input text and then ranked all techniques based on their cosine similarity to the input text.

Label ranking loss vs a.
We conducted four experiments to evaluate SMET. We used the CTI dataset for the first two experiments and our manually annotated CVE dataset for the third and fourth experiments. We only included the CTI dataset’s pre-defined 50 techniques in training and evaluating SMET in the first two experiments. Moreover, we bypassed the attack extraction component and treated each entire sentence as an attack vector since the sentences are short and comprehensive. In the third and fourth experiments, we included all ATT&CK techniques (185 techniques), retained the attack extraction component, and applied SMET to the full description of the CVE entry.
When evaluating SMET using the CTI dataset, we employed label ranking loss and three recall@k metrics, where we set k to 1, 3, and 5, representing the top 2%, 6%, and 10% of the predicted labels (50 techniques). Our emphasis in this experiment was on utilizing recall@k since each sample is associated with only one label. When evaluating SMET using the CVE dataset, we employed multi-label ranking metrics, including coverage error, label ranking loss, and LRAP, as a single entry can have multiple labels. We also utilized recall@k. Where we set k to 5, corresponding with the top 2.55% of the predicted labels (185 techniques).
SMET and baselines’ CTI dataset results
SMET and baselines’ CTI dataset results
As mentioned in the Approach section, we employed a class weighting formula during training that incorporates the tunable parameter a. To determine the optimal value of a, we randomly selected 20% of the CTI dataset and used it to evaluate SMET across a values ranging from 0 to 1. Figure 11 shows the relationship between the label ranking loss metric and a parameter. As can be seen, when setting a to 0, which corresponds to assigning the same weight to all samples, the ranking loss is about 0.0246. Furthermore, when setting a to 1, which corresponds to balanced class weights, the ranking loss is around 0.025. Within the range of 0 to 1, a undergoes significant variation, with the minimum loss reaching 0.023 when a is set to 5. Consequently, we chose to set a to 5 in both of our experiments.
We evaluate SMET’s mapping using the CTI dataset, excluding the 20% samples we used to tune a. Experiment results are shown in Table 2. SMET’s performance significantly surpassed SBERT and TF-IDF’s performance. We attribute the superiority of SMET to two primary reasons. First, SMET uses ATT&CK BERT, which we fine-tuned using the ATT&CK matrix and can thus extract embedding that better represents attack vectors. We further examine the importance of ATT&CK BERT by replacing it with existing embedding models later in this section. Secondly, instead of using cosine similarity to rank the technique, SMET employs a logistic regression model that we trained using supervised learning to predict attack vector techniques. Logistic regression can model more complex relationships between an input text and techniques based on a training set, unlike cosine similarity, which merely reflects the distance between text embedding vectors.
SMET’s confidence analysis
Since SMET generates a confidence score for each technique, this subsection studies the precision of SMET in correlation with its confidence scores. This study holds significance for two main reasons. Firstly, not all attack vectors extracted by the attack vector extraction component are valuable or can be mapped to ATT&CK. Therefore, the ranking aggregation process must ensure that the top-ranked techniques correspond to valuable attack vectors. This alignment can only be achieved if the model’s confidence scores correlate with accurate mapping. For instance, consider two attack vectors, AV1 and AV2, extracted from an input text. AV1 represents the technique T1, while AV2 is irrelevant to ATT&CK. The logistic regression model must assign a high score to T1 in AV1 mapping while assigning low scores to all techniques in AV2 mapping, expressing a lack of confidence in any of them. As a result, T1 will be prioritized after ranking aggregation, while all techniques mapped to AV2 will receive lower rankings.
Secondly, confidence scores assist security analysts in placing trust in SMET’s mapping. High scores for top-ranked techniques indicate a higher level of confidence, prompting security analysts to rely on the mapping. Conversely, low scores for top-ranked techniques signal a need for further investigation, prompting security analysts to delve deeper when assessing techniques with lower confidence scores.
We designed the experiment as follows. We selected thresholds ranging from 0.1 to 0.9. We filtered SMET’s output for each threshold to include only those techniques with scores surpassing the given threshold. For instance, if the threshold is set at 0, all techniques are considered, while none are considered at a threshold of 1. For each threshold, we report the precision of SMET, which reflects the proportion of accurate predictions among all predictions. Figure 12 shows the relationship between precision and prediction threshold. The figure clearly illustrates the positive correlation between confidence levels and precision. For instance, if SMET assigns a score of 0.5 to a technique, there is a 91% probability that this mapping is correct. Consequently, we can infer that SMET’s confidence in its predictions aligns with true labels.

SMET’s precision vs prediction threshold.
To investigate SMET’s performance further, we generated the confusion matrix of SMET’s prediction for samples where the top-predicted technique has a confidence score of 0.5 or higher. Figure 13 shows the confusion matrix where rows represent the true labels and columns represent the predicted labels. As observed in the matrix, SMET exhibits mapping confusion between specific techniques. For example, SMET mapped 19 samples from the obfuscated files and information technique to the indicator removal technique and 12 to the deobfuscate/decode files or information. Moreover, SMET mapped 12 samples from native API techniques to system information discovery, system network configuration discovery, and process injection techniques, four samples each. SMET mapped six samples from the ingress tool transfer technique to the command and scripting interpreter technique.
We investigated these inaccuracies and discovered that, in most cases, SMET’s mappings are more accurate than the ground truth. For example, from the 19 samples of the obfuscated files and information technique that SMET mapped to the indicator removal technique, 17 represent the indicator removal technique. Examples of these samples are: “delete itself from the target systems after infection” and “It removes log files and temporary files from the root directory.” In the case of the Native API techniques sample, although the samples’ true labels are accurate as they all contain native APIs, SMET mapping was more specific as it mapped to the techniques that the APIs achieve. For example, SMET mapped “NetServerGetInfoto retrieve the current configuration” to system network configuration discovery, “GetComputerName API” to system information discovery, and “use API calls such as VirtualAlloc to load and execute malicious components into memory” to process injection. We investigated other ground truth and SMET prediction mismatchings and discovered similar findings. Therefore, we uploaded SMET’s predictions of this dataset to the project’s GitHub page so cyber threat intelligence experts can compare the mappings and provide insightful feedback.

Confusion matrix of CTI dataset confident predictions.
As shown in Table 3, our tool significantly outperforms all baselines in all metrics. In addition to the rationales provided in previous experiments, we attribute the superior performance of SMET in CVE mapping to utilizing an attack vector extraction component. Unlike other approaches that treat entries and technique descriptions as a single block, SMET extracts attack vectors as a key feature.
LLM chatbots. Recently, large language models (LLMs) chatbots have been used to solve various NLP tasks. These chatbots have knowledge of the web and can answer complicated questions regarding various topics without any fine-tuning. We studied the performance of one of the most popular chatbots in mapping CVE to ATT&CK by providing it with a CVE entry description and asking it to map that description to techniques. Due to government policy, we are unable to reveal the name of the chatbot. We conducted three experiments by providing the chatbot with the following three prompts:
Please identify MITRE ATT&CK techniques that can be associated with the CVE description below: “cve_des”. Please return a list of identified ATT&CK techniques IDs.
Please identify what MITRE ATT&CK techniques an attacker can perform using the CVE description below: “cve_des”. Please return a list of identified ATT&CK techniques IDs.
Please identify MITRE ATT&CK techniques that are mentioned in the CVE description below: “cve_des”. Please return a list of identified ATT&CK techniques IDs.
We used regular expressions to extract technique IDs from each response.
SMET and baselines’ CVE dataset results
Although the chatbot showed a basic understanding of ATT&CK, its mapping performance was inferior. Although the prompts were semantically similar, the chatbot showed inconsistent behavior and responded with a different set of techniques for each prompt. For example, the number of techniques the chatbot extracted across all responses were 976, 993, and 486 techniques for the first, second, and third prompts, respectively. In only six instances, all three prompts provided the same techniques, and in 41 instances, the three prompts agreed on at least one technique.
To compare the chatbot to SMET, we combined the prediction of the three prompts by making each prompt vote for its predicted techniques and ranked the techniques based on the number of votes. Techniques with the same number of votes were ranked randomly. We presented the results in Table 3. The chatbot performance was inferior in all metrics except R@5, where it performed slightly better than the TF-IDF and SBERT models. We attribute these results to the fact that the chatbot is not a ranking model. Even though we ranked techniques based on prompt voting, most techniques had no votes and were ranked randomly. However, R@5 is a classification metric that gives a more accurate indication of the chatbot’s performance.
SMET components analysis
We investigated the performance of SMET after replacing its components with other existing alternatives from the literature. In the first experiment, we replaced the attack vector extraction model with a LLM chatbot by using the following prompt: “Please identify actions that can be done by an attacker from the following description inclusively: “cve_des.” Return each action along with all its objects in a line with the format subject–verb–objects.” We then used a regular expression to extract the attack vectors from the chatbot’s responses. In the second experiment, we replaced ATT&CK BERT with different transformer models introduced by researchers. We used
As shown in Table 4, SMET achieved the highest score across all metrics. On average,
When we used the LLM chatbot for extraction, the performance of SMET decreased. As discussed in the previous section, the chatbot output is inconsistent and sensitive to prompts. Moreover, unlike SRL models, the chatbot responds with unstructured text, which requires further processing using regular expressions to extract attack vectors. This unstructured response introduces noise that might affect the quality of extracted attack vectors. More investigation needs to be conducted to study the possibility of using LLM chatbots to accurately extract attack vectors from unstructured text. We leave this challenge for future work.
Conclusion and future work
In this paper, we introduced SMET – a tool that automatically maps text, such as cyber threat intelligence (CTI) reports and CVE entries, to ATT&CK to assist security analysts in understanding threats and vulnerabilities and gaining more insight into appropriate countermeasures. SMET utilizes a semantic extraction model to extract concise attack vectors and leverages ATT&CK BERT to extract semantically meaningful embedding of unstructured attack description text. We evaluated SMET using an existing dataset of CTI reports mapped to ATT&CK and a dataset that we manually created based on our analysis of the CVE-ATT&CK association. SMET achieved a robust performance compared to existing state-of-the-art models. Moreover, we studied SMET’s mapping confidence and used it to identify mislabels in a public dataset. We also studied the importance of SMET’s components by replacing them with existing alternatives from the literature. For future work, we aim to investigate the incorporation of extended context from the full input text for ATT&CK mapping. The current practice of extracting attack vectors prior to ATT&CK mapping can sometimes separate the attack vector from its context, which, in some cases, is essential for accurate mapping.
Disclaimer. Certain equipment, instruments, software, or materials are identified in this paper in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement of any product or service by NIST, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose.
Footnotes
Acknowledgments
The research reported herein was supported in part by ARO award No. W911NF2110032, NIST Award No. 60NANB23D007, NSF awards DMS-1737978, DGE-2039542, OAC-1828467, OAC- 1931541, and DGE-1906630, and ONR awards N00014-17-1-2995 and N00014-20-1-2738. This research was also supported in part by the National Center for Transportation Cybersecurity and Resiliency (TraCR) (a U.S. Department of Transportation National University Transportation Center) headquartered at Clemson University, Clemson, South Carolina, USA. Any opinions, findings, conclusions, and recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of TraCR, and the U.S. Government assumes no liability for the contents or use thereof.
