Abstract
In this paper we present a graph-based framework that, utilizing relations between groups of System-calls, detects whether an unknown software sample is malicious or benign, and classifies a malicious software to one of a set of known malware families. In our approach we propose a novel graph representation of dependency graphs by capturing their structural evolution over time constructing sequential graph instances, the so-called Temporal Graphs. The partitions of the temporal evolution of a graph defined by specific time-slots, results to different types of graphs representations based upon the information we capture across the capturing of its evolution. The proposed graph-based framework utilizes the proposed types of temporal graphs computing similarity metrics over various graph characteristics in order to conduct the malware detection and classification procedures. Finally, we evaluate the detection rates and the classification ability of our proposed graph-based framework conducting a series of experiments over a set of known malware samples pre-classified into malware families.
Introduction
Malware or malicious software is a software type intended to cause harm to end point computers, systems or networks [54]. In this work we design and propose a graph based model that develops an algorithmic technique for malware detection and classification. Our method is applied on unknown software samples in order to detect whether they are malicious or not, and further classify them to one of a set of known malware families (i.e., set of malicious malware samples with similar functionality), as they have been developed by various antivirus software vendors.
Protection against mutated malware
On the contrary part of our scientific field, malware authors, have developed and deployed various techniques in order to avoid the traditional byte-level signature based detection methods. Since such detection methods appear to be significantly fragile against even the least mutation of the initial subject (i.e., ancestor malware sample), they mutate their software products (malware) creating structurally different but functionally similar copies of them. Except from the mutation methods that leverage one, or more, levels of encryption, there also exist more advanced mutation methods. Some of the most applicable malware mutations are the oligomorphism which is achieved through obfuscation techniques, the polymorphism [48] where the code is modified through encryption techniques and the metamorphism, in which multiple structurally different copies of a malware sample are generated.
Hence, while the main functionality of a malware sample remains immutable during its mutations, malware samples can be merged into groups of malware samples with common functionality, the so called malware families. So, in this work we developed an algorithmic technique that not only detects if a program is malicious or not, but additionally, given a malicious software it can decide the malware family that it belongs to.
Since malicious software poses a major threat, several protection approaches have been proposed and implemented in order to eliminate such threats. The main corpus of the defence line is mainly developed over three axes, namely malware analysis, malware detection and malware classification. Malware analysis [54] is the process of determining the purpose and the functionality or, abstractly, the behaviour of a given malicious code. The term malware detection referrers to the process of determining whether a given program P is malicious or benign according to an a priori knowledge [35]. Finally, the term malware classification refers to the process of determining the malware family to which a particular malware sample, let M belongs to, in order to provide the ability to generalize detection signatures from sample level to family level.
Contribution
In our approach, we leverage the use of System-call Dependency Graphs (ScDG) to represent the software samples in order to distinguish if they are malicious or not and further classify them to a malware family. More precisely, in order to make the detection and classification procedures more resilient to known malware mutation techniques, we construct a directed edge-weighted graph, which we call Group Relation Graph – GrG [45], that is a generalized graph structure resulting from ScDG after grouping disjoint subsets of its vertices. Throughout these processes, over specific time intervals, we preserve instances of GrGs creating hence temporal graphs that depict their structural evolution over time. Given a ScDG graph that represents a known malware sample and a ScDG graph that represent an test sample, we utilize these instances (i.e., temporal graphs) representing the evolution of their corresponding GrG, produced both on each ScDG, in order to perform graph similarity towards the processes of malware detection and classification.
In this work, we propose, develop and present an integrated graph-based framework for distinguishing graph representations referencing malicious software samples and further classifying them in sets of known malware families. Firstly, we discuss our proposed graph abstractions that are based on the ScDG graphs representing the relations between system-call groups (i.e., GrG), and present another graph representation that describes the temporal evolution on the structure of the GrG graphs. In our approach we distinguish two types of temporal graphs according to the registration of the structural modifications and compute the graph similarity between such temporal graphs utilizing specific characteristics of graphs Finally, we present the potentials of our approach, by evaluating the detection and classification rates exhibited by our framework against malicious samples.
Related work
Road map
In Section 2 we present the prerequisite theoretical background in order for the graph-based techniques for detection and classification to be developed. Next, in Section 3 we demonstrate the key components of our model, in Section 4 we discuss extensively the main principles and design aspects over the development of our detection and classification scheme concerning the two processes, where in Section 5 we measure the detection and classification potentials of our integrated framework against malicious samples. Finally, in Section 6, we set our further research landmarks concerning the processes of malware detection and classification.
Conceptual framework
In this section we discuss the semantics behind the processes of malware analysis, detection and classification. We discuss the principles of the utilization of behaviour-based approaches towards the deployment of resilient detection and classification techniques. Firstly we present the major process preceding the development of detection and classification methods, that is malware analysis, and next we depict the state-of-the art behavioural approaches applied on malware detection and classification.
Analysing susceptible samples
The traditional signature-based malware detection, despite its fast real-time protection, is still not resilient against malware mutations. Robust detection techniques prerequisite the procedure of malware analysis, during which, the analyst collects all the required information, in order to be effective and efficient. The effectiveness of signature scanning, relying on pattern matching fails to detect new malware strains or mutated variants of existing ones [8]. The procedure of malware analysis is consisted by the collection of valuable information concerning either static artifacts or generally behavioural patterns, that could characterize the maliciousness, or not, of a sample, being categorized to two main categories namely static analysis and dynamic analysis, respectively [6,53,57].
Static malware analysis of software is performed over the programming artifacts and structural characteristics of a software sample [9], without the need of its execution. The information obtained during static malware analysis may refer to opcode sequences, control flow graphs, etc. and can be used at will for malware detection [8]. However, since the sample does not need to be executed can be surpasses by easily foiled by obfuscation and packing techniques hence its scalability consists one of its assets [31,53].
On the other hand, Dynamic Malware Analysis refers mostly to the extraction of behavioural features exhibited during the execution of a malicious software sample, mainly captured and depicted through API-calls sequences and the system-calls dependencies [8]. The scalability of dynamic malware analysis may be reduced due to the demand of real time execution [53]. Moreover, despite that obfuscation techniques can easily be defeated through dynamic analysis the time needed for analysis is disproportionate to the rate of birth of mutated malware samples [31].
A specific type of dynamic analysis, called dynamic taint analysis or DTA (stands for dynamic taint analysis) traces data flows in programs or systems during execution time. Briefly speaking, taint analysis distinguishes three elements namely taint sources, taint sinks, and propagation rules. Data flows are taint variables introduced by taint sources (i.e., the output parameters of system calls) and propagated according to the propagation rules to taint sinks (i.e., the input parameters of system calls) [3].
Detecting malicious behaviors
As we mentioned previously, malicious software samples are intended to compromise the privacy, the confidentiality or the integrity of a system, of data or any other cyber-source constituting hence an intrusion. To this end, Intrusion Detection Systems, or, for short IDS, are deployed in order to monitor the execution of applications, the traffic of networks or whole systems, aiming on spotting malicious activity patterns [2]. The system supervision through an IDS can be performed through the application of malware detection techniques, that reference file comparisons against signatures of malicious software [12], behaviour monitoring of malicious patterns and system supervision [2].
However, the increasing birth-rate of new or mutated malware samples has raised the need for efficient and elaborated malware detection techniques that can effectively detect new malware strains in reasonable amounts of time. The detection approaches are strongly connected to the features set provided through the previous stage of malware analysis, and are distinguished to static and dynamic features, respectively. Static features may include, statistical analysis on n-grams or opcodes, properties of control flow graphs, while dynamic features are obtained the execution time of a program and concern its general behaviour (i.e., interaction with the host-environment – O.S.), access events or any other interconnection patterns [18]. Malware detection approaches are divided into two main categories, namely signature-based malware detection and behaviour-based malware detection [1,5,8,22,55]. Next we briefly discuss these two methods and present some of the approaches deployed in each one.
Signature-based malware detection is the dominant technique deployed by antivirus software products due to its time efficacy that provides real-time protection against malicious threats [9]. A byte-level signature is a sequence (i.e., pattern) of bytes used to identify each newly discovered malware, using a scanning scheme of exact correlation and a repository of signatures in order to detect malicious software samples [8]. A signature may represent a byte-code sequence, a binary assembly instruction, an imported Dynamic Link Library (DLL), or function and system calls. [1,5]. However, signature-based detection techniques can easily be evaded through code obfuscation techniques (e.g., polymorphic malware samples)that even the least modification on the code sequence would lead to a completely different byte-sequence [8].
On the other hand, behaviour-based malware detection [24] mainly focuses on capturing the interaction (in terms of interconnection, relations or dependencies between system-elements i.e., system-calls or API calls) between the executed software and the system (i.e., Operating System) [3,5,50,52]. Behaviour-based detection systems as expected require the execution of the software sample in order to extract dynamically (i.e., through dynamic taint analysis) the exhibited behaviours. In order for these dynamic systems to perform the mining of the specified behaviours they utilize software and hardware virtualization technologies, alongside with imitation conditions [55], providing the test sample with an environment as close to reality in order to evade the sandbox-detection mechanisms deployed occasionally by malicious software samples, and letting them exhibit their intentions. Despite the fact that such techniques deploy quite elaborate algorithms on their implementations, the incident that malware families tend to evolve in order to avoid detection [1], results to the need of the development of more elastic and mutation resilient techniques like the one we propose in this work.
Classifying malware samples
Malware authors, in order to avoid traditional detection methods, produce new (mostly mutated) malware samples rapidly, utilizing existing ones in order for the new strains to preserve the functionality inherited from their ancestors. As referred in [5] mutated malware samples are generated from existing ones utilizing automated techniques [63] or integrated tools, generating new samples from libraries and code parts from code exchange networks. The term malware classification has been confused several times with malware detection. Distinguishing these procedures, malware detection is a binary classification, where a set of unknown samples is classified against a collection of malicious and benign samples, while malware classification refers to the indexing of an already detected malware sample belongs to a particular family or type [17] of similar samples. As described in [27], malicious software samples that belong to the same malware family tend to exhibit similar behavioural and structural profiles, and hence malware classification augments the analysis of new, or mutated, malicious samples where their signatures have not been constructed yet [46].
Another field of malware analysis applied in malware classification is malware phylogeny [38], which aims on inferring evolutionary relationships between instances of families. The major profit from creating a phylogeny model is the fact that newly developed elaborated detection systems that deploying such techniques can detect that a sample that has not been previously seen can be related to a malware family, when analysed along an evolution path [32]. Throughout this process the main target is to reveal similarities and relations among a set of specific malware samples coexist and are exhibited by all the members of the set (i.e., malware family) [58], distinguishing its type or family. Such approaches can be utilized to identify evolution trends in over a set of malware samples [5], constituting hence valuable tool for more generalized signatures or, in general, more elaborated detection-techniques. The models applied on phylogeny, using mostly phylogenetic networks, model evolutionary relations among malware families, describing temporal ordering among samples, defining ancestor-descendent relations, as also relationships between families, augmenting hence malware classification and unveiling evolutionary trends [33].
Analysing the evolution of temporal graphs
In this section, we discuss the properties of our initial graph representation i.e., the System-call Dependency Graph (i.e., ScDG) and the proposed structural components of our model, namely, the Group Relation Graph (i.e., GrG) and its corresponding graphs that depict its structural evolution through time (i.e., the temporal graphs). Given such graph representations, we present the construction of the key components of our proposed detection and classification model, i.e., their corresponding derivative graphs that depict the temporal evolution of a GrG graph.
Graph representation
The system-calls invoked during the execution of a program can be traced through taint analysis, and hereafter the behaviour of a program can be represented with a directed acyclic graph (dag), the so called System-call Dependency Graph see, Fig. 1(a). The vertex set of a ScDG is consisted by all the system-calls invoked during the execution of a program and its edge set represents the communication between system-calls as described in [3,39,44]. Recalling that the suspicious sample needs to be executed in a contained environment (i.e., a virtual machine), where during its execution time, dynamic taint analysis is performed in order to capture system-call traces, next we illustrate a simple example that includes the system-call traces obtained, constructing the ScDG of a program. In Fig. 1(a), it is easy to see that the vertex set of this graph is consisted from the system-calls invoked during the execution of the sample and its edge set is consisted by their in between data-flow dependencies, constructing a directed acyclic graph (DAG).

(a) the system-call dependency graph of a program; (b) the corresponding group relation graph of a program.
Moreover, as described in [39,44] given a graph representation of malware-behaviour such a ScDG, let G, a more abstract graph representation of a program’s behaviour can be constructed based on the fact that system-calls of similar functionality can be classified into the same group [45]. The produced graph representation is a directed weighted graph called Group Relation Graph, that we denote as for every pair for each directed edge
which consist the basic component upon which the core graph representation of our model is based.
Throughout the development of our research, we have noticed that, to the best of our knowledge, there does not exist any approach on the literature that utilizes the temporal evolution of a graph in malware detection and classification. Similarly to phylogeny that examines the temporal evolution of malware families, the key component of our proposed detection and classification model, leverage the temporal evolution of graphs (in our case GrG graphs) in order to depict the structural modifications performed on the graph and that could distinguish either a malware sample or to a further extent a malware family.
In our model, we define a type of graphs that depict the temporal evolution of our initial graph structures, namely the Group Relation Temporal Graphs or, for short, GrTG. In order to implement such graph structures we approach this modeling by creating instances of the initial GrG graphs during their construction. As we mentioned above, GrG graphs are constructed by the sum of the system-calls invoked interconnecting pairs of system call groups. Hence, since we are given the system-call dependencies in a series that depicts the time correlation among (i.e., an edge sequence of the ScDG that shows the system-call invocations during execution time), such constructions can be obtained by creating an instance of the produced GrG graphs at specific steps.
Formalizing our previous claim, we can define that for a set of time-partition, let
In our proposed model for the development of temporal graphs we define as quantum the elementary interaction between two system-calls in the initial ScDG graph, i.e. an edge, while the
Discrete and cumulative modification temporal graphs
The factor of time actually does not represent the actual quantum of run-time, but each time-quantum corresponds to the invocation of one system-call (i.e., a dependency) or, equivalently, a relation between two System-call Groups (i.e., an edge of the Group Relation Graph). Hence, the total time-line depicts the slots or time-partitions from the appearance of the first to the last group relation. To this point we should notice that the conceptual substance of Temporal Graphs is to depict the structural evolution of the GrG graphs through the time. However, the structural modification on the instances of the graph over the time can be described either as addition of edges over the exact previous graph instance, or as successive additions of edges performed on all the previous graph instances.
Through this aspect, we can transform a given ScDG into a temporal graph, by registering the modifications performed on the corresponding GrG over is evolution during time, taking into account the sequence (i.e., chronological order) the system-calls were invoked during the execution of the program. Next, we discuss the construction of the corresponding Temporal Graphs according to the two approaches.
In the first approach of our proposed scheme, the construction of the Temporal Graph, that represents the evolution of GrG graphs during time, constructs the induced sub-graph of GrG, respectively, including only the edges that where added on a specific

The temporal evolution of a GrG graph
In this type of Temporal Graphs, the evolution of the graph is represented as an additive procedure, since once an edge has been created at a given time, let i, between two system-call groups on the GrG graph, it will remain permanent on the ancestor Temporal Graphs (i.e., if

The temporal evolution of a GrG graph
In this section we present the key components of our detection and classification model, and describe the key insights that constitute the basis of the corresponding procedures. Discussing the design principles that rule the deployment of our model’s components, we present an overview of our detection and classification techniques.
Design principles
Malware detection and classification are two interconnected procedures. In malware detection the main target is to determine whether a given program is malicious or benign according to an a priori knowledge over what known to be malicious, where malware classification is the following procedure and its intent is to determine the malware family to which the sample, that has been detected as malicious, belongs to. It easily follows that, an a priori knowledge of characteristics of known maliciousness has to be stored in a knowledge database, as also that in order to compare two subjects, a similarity measure among them is required. Moreover, the proposed theoretical approach specifies the form of the subjects, where graph-based models interact with similarity metrics that measure specific types of characteristics that represent evolutionary commonalities between temporal graphs. Next, we present the architectural considerations, the design principles, the functionality, and the corresponding deployment of the key components of our proposed detection and classification model.
Knowledge database
The knowledge database is consisted by a set of known malicious samples that have been classified to malware families according to their functional, structural and mostly behavioural commonalities. More precisely, various anti-virus vendors have classified these samples to families based on their own heuristic rules concerning shared behavioural patterns and functionally similar execution profiles. Regarding the detection process, the knowledge database, except the known malicious samples, also includes benign samples in order to measure the false positive rates (i.e., benign samples that have been detected as malicious) evaluating the detection ability of our model. On the other hand, regarding the classification process, the benign samples are not needed, as in such procedures a classification model only has to decide the family in which a sample, already distinguished as malicious, belongs to.
Graph structures
The major target of our approach is to utilize the graphs that depict the temporal evolution of the produced GrG graphs (i.e., GrTG graphs) in order to measure the graph similarity among test sample and samples that have been already detected as malicious, leveraging their structural modification that take place during the execution time of the programs that they represent. In our work we have a theoretically stable intuition that the factor of time, regarding the structural evolution of a graph is a strong qualitative characteristic that could definitely distinguish the behaviour of a program and further be utilized to the development of more elaborated detection and classification techniques over unknown samples.
To this end, we ought to notice that regarding the time quantization procedure, where the time slots where the graph instances have to be retained, there could be applied several different approaches, that would affect the application results. In other words, the implementation of our proposed model on a fine-grained time quantization scheme, would be more precise against a more coarse-grained once, where on the other hand a trade-off between the precision on temporal structural modifications and the construction of more distinguishing patterns poses the basis of further tuning issues.
Measuring the similarity between temporal graphs
Next, we discuss the computation of similarity between the temporal graphs of a given test sample τ, let
Next, we present the similarity metrics deployment for the measurement of different structural characteristics, i.e., relational, qualitative, and quantitative characteristics, exhibited through graphs.
Next, in order to compute the similarity S between the temporal graphs of a test and a known malicious sample, let
Similarly, in order to compute the similarity between the temporal graphs of the test and the known sample as they have been formed over n
On the other hand, w.r.t the computation of similarity regarding the quantitative characteristics exhibited between the temporal graphs of the test and the known sample as they have been formed over n
Finally, the computation of the similarity concerning the qualitative characteristics between the temporal graphs of the test and the known sample as they have been formed over n
To this end, we ought to notice that the similarity measurements mentioned above can be computed utilizing either the discrete modification temporal graphs, or the cumulative modification temporal graphs, at their basis. Hence, for these cases the general formula for the similarity computation is developed as follows:
Malware detection
We implement our malware detection model by first performing a transformation to the initial ScDG graphs G, converting them to GrG graphs

Architecture of the detection model.
In the example of Fig. 4 we suppose we are given an unknown test sample τ that we do not know if it is malicious, and we are asked to decide if τ is malicious or benign. Having a database with the Temporal Graphs of known malware samples. Once the corresponding Temporal Graphs have been constructed, we compute the our similarity metrics between the Temporal Graph of τ and each Temporal Graph that represents a malware sample in our database. So, let M the total number of malware samples in our database, we result to M values in our measurements on our similarity metrics (one per pair
Our proposed method is based on application our proposed similarity metrics over the set of known malware families in order to classify on them an unclassified malware sample, let τ. More precisely, our method selects the family that is most similar to τ according to the similarity results exhibited by the measurements performed utilizing the similarity metrics. A malware family is called

Architecture of the classification model.
In Fig. 5 we show a representation of the procedure for classifying an unknown test sample τ to a known malware family utilizing the aforementioned methods. More formally, our classification technique proceeds as follows: given a set of known malware families
The deployment model of the proposed components is consisted by the graph structures utilized to represent the software’s behavioral characteristics (i.e., the Temporal Graphs that represent their temporal evolution through time), the knowledge base, that is a database storing Temporal Graphs representing known malware samples, and the similarity metrics developed to capture structural and qualitative commonalities among such behavioral graphs. Our proposed graph-based malware detection and classification model is partitioned into two phases. The first phase concerns the detection procedure, where an unknown sample, let τ, is needed to be detected as malicious or benign. Our model’s implementation utilizes the Temporal Graphs taken from a database of known malware samples and the Temporal Graph of test sample τ in order to compute their structural similarities across their temporal evolution. The second phase concerns the classification procedure, where an unknown sample, let τ, that has been already detected as malicious is needed to be classified to one of a set of known malware families. Our mode’s implementation utilizes the Temporal Graphs taken from a database of known malware samples already been classified to malware families and the Temporal Graph of test sample τ in order to compute their structural similarities across their temporal evolution and further classify τ to one malware family from our data-set. In Fig. 6, we represent an abstract overview of the deployment of our proposed graph-based model for malware detection and classification.

The deployment of malware detection and malware classification processes in our model.
In this section we present the experimental set-up utilized for the evaluation of our proposed graph-based technique for malware detection and classification utilizing the discrete modification temporal graphs and the cumulative modification temporal graphs, and discuss the exhibited results for the two procedures, namely malware detection and malware classification, presenting the potentials of our framework through a series of experiments over a data-set of malicious and benign samples.
Experimental set-up
Next, we present the methodology followed for the evaluation of our framework on detecting and classifying malicious samples, the data-set of the known and the test samples, the methodology we followed in order partition the data-set to train and test set and the structure of a series of evaluation experiments, and finally the experimental design regarding the utilization of different similarity metrics over the two distinct types of temporal graphs.
Data-set
In order to evaluate the first phase of our proposed framework against malicious samples, regarding its detection potentials we utilized a data-set of 2631 malicious samples, and a particularly for the evaluation of the false-positive rates of the detection procedure a set of 35 benign samples that cover a wide range of commodity software types (i.e., commodity software types including editors, office suites, media players, browsers, standardized procedures, etc.). To this end, we ought to notice that the diversity of the benign sample data-set covers a wide range of commodity software samples, compensating the number of malicious samples, regarding that in the benign ones there exist actually one sample of each kind, or in other words singleton (i.e., one member per family) benign families. For the second phase of our proposed framework regarding its ability to index the detected samples to known malware families, we utilized the grouping of the 2631 malicious samples into 48 distinct malware families including from 5 to 317 samples pre-classified into them through heuristic rules as described in [3]. Next, in Table 1 we list the set of the 48 malware families along with their sizes (i.e., number of members).
Our proposed framework operates on graph representations of software samples (malicious or benign), the so called System-call Dependency Graphs obtained through dynamic taint analysis over their execution in a contained environment. These graphs (i.e., ScDG graphs) containing System-call dependencies among the invoked System-calls are transformed to their corresponding GrG graphs by either adding an edge between the corresponding groups of the pairs of each invoked System-calls (if this edge does not already exist) or by increasing by one unit the weight of this edge (if this edge already exists). Hence the ScDG graphs that constitute our initial data-set, and thus the train/test sets utilized for the evaluation of our approach, are then transformed to their corresponding Group Relation Graphs, (i.e., GrG graphs) following the procedure described in Section 3.1. To this point it is quite important to refer that the initial data-set that contains the ScDG graphs from malicious sample pre-indexed into malware families and the set of benign samples, is constructed by behavioural graphs that their vertex-sets contain
Hence, having defined a number of
The set of the 48 malware families
provided by Babic et al., along with their sizes (i.e., number of members), as used in [3]
The set of the 48 malware families
In order to evaluate the detection and classification ability of our framework, we perform a five-fold cross validation procedure, partitioning the data-set into a 80% train-set and a 20% test-set, rotating the samples over each iteration. These samples refer to the graph objects that are consisted by the types of temporal graphs, i.e., CMTG and DMTG, that resulted by registering the modification over the evolution of a GrG graphs according to its corresponding ScDG graph, with respect to both the temporal graph type and the
Regarding the evaluation of our detection ability model is achieved by performing a series of five-fold cross validation experiments over different similarity metrics that measure the graph-similarity regarding different types of graph characteristics (i.e., relational, quantitative, qualitative) and various number of
Then, this procedure is iterated over various values of threshold λ in order to depict the behaviour of our model across this parameter. The procedure mentioned so far, iterates over a different number of
Similarly,in each case of experiment, except the maximum similarity exhibited between the sample under consideration, i.e., τ, and each of the known malicious samples, let s, having the labels of each known malicious sample, let
System performance
Throughout the evaluation of our framework we performed a five-fold cross validation series of experiments iterating over the 8 different series of

Execution time (in hours) required from the framework to perform a five-fold experiment for 8 series of
To this end, note that in the corresponding plot the x-axis represents the values the
In the next experimental study of our model, we utilize the data-set discussed in previous section in order to perform a five-fold cross validation utilizing in each case the
Detecting malicious samples utilizing DMTG
In this class of experiments we evaluate the detection potentials of our model, utilizing the discrete modification temporal graphs (DMTG) for the representation of software samples under consideration in order to represent the temporal evolution of its corresponding GrG graph. In the experiments discussed next, we measure the graph similarity of between any sample form a set of unknown samples against a set of known malicious sample. Particularly, there are examined four types of similarity measurements between DMTG graph representations, the one that concerns the measurement of similarity regarding the relational characteristics of graphs through the utilization of Jaccard Similarity, the next that concerns the measurement of similarity regarding the quantitative characteristics through the utilization of the Δ-Similarity metric, and finally the measurement of similarity regarding the qualitative characteristics through the utilization of Cosine similarity metric. In each case we perform 8 five-fold cross validation experiments, one for any number of
Detection using Relational Characteristics of DMTG
In Fig. 8, with purple continuous and dashed lines for TP-rates and FP-rates, repetitively, we cite the results exhibited by the application of Jaccard Similarity in order to measure the similarity regarding the relational characteristics appeared in common in the graphs of malicious and benign test samples against known malicious samples, all represented by their discrete modification temporal graphs (DMTG). The exhibited difference between TP and FP rates is maximized in the following cases: 2
From (a) to (h) we depict the detection rates (continuous line) and the corresponding false positive rates (dashed line) for various 
Detection using Qualitative Characteristics of DMTG
In Fig. 8, with orange/green continuous and dashed lines for TP-rates and FP-rates, repetitively, we present the results exhibited by the application of Δ-Similarity metric in order to measure the similarity regarding the qualitative characteristics appeared in common in the graphs of malicious and benign test samples against known malicious samples, all represented by their discrete modification temporal graphs (DMTG). Since the GrG graphs are directed weighted graphs, over the corresponding experiments we distinguished two cases where in order to leverage the valuable information depicted in the degree and the weight of each vertex, and respectively how they are evolved during time (i.e., modifications performed during a
Detection using Quantitative Characteristics of DMTG
In Fig. 8, with blue continuous and dashed lines for TP-rates and FP-rates, repetitively, we cite the results exhibited by the application of Cosine Similarity deployed in order to measure the similarity regarding the qualitative characteristics appeared in common in the graphs of malicious and benign test samples against known malicious samples, all represented by their discrete modification temporal graphs (DMTG). The exhibited difference between TP and FP rates is maximized for 2
As we can observe from the exhibited results depicted in Fig. 8, the increase of the
In this class of experiments we evaluate the detection potentials of our model, utilizing the cumulative modification temporal graphs (CMTG) for the representation of software samples under consideration in order to represent the temporal evolution of its corresponding GrG graph. The procedure followed for the conduction of the second class of experiments is the same as the one described in the previous subsection. Hence, similarly to the previous class of experiments, there are examined four types of similarity measurements between CMTG graph representations, concerning the measurement of similarity regarding the relational,the quantitative, and the qualitative characteristics, utilizing the Jaccard Similarity, the Δ-Similarity, and the Cosine Similarity metric, respectively. Following the setting of the previous class of experiments, we perform 8 five-fold cross validation experiments, one for any number of
Detection using Relational Characteristics of CMTG
In Fig. 9, similarly to the procedure followed for the experiments utilizing the CMTG graphs, with purple continuous and dashed lines for TP-rates and FP-rates, repetitively, we cite the results exhibited by the application of Jaccard index in order to measure the similarity regarding the exhibited relational characteristics between malicious and benign samples against known malicious samples represented by their cumulative modification temporal graphs (CMTG). The exhibited difference between TP and FP rates is maximized for 2
From (a) to (h) we depict the detection rates (continuous line) and the corresponding false positive rates (dashed line) for various 
Detection using Qualitative Characteristics of CMTG
In Fig. 9, with orange/green continuous and dashed lines for TP-rates and FP-rates, repetitively, we present the results exhibited measure the similarity regarding the exhibited qualitative characteristics between malicious and benign samples against known malicious samples represented by their cumulative modification temporal graphs (DMTG) by the application of Δ-Similarity metric. As in the corresponding series of experiments on CMTG graphs, two cases are distinguished, where in the first case the experiments are conducted utilizing the Δ-Similarity taking into account how the vertices of GrG graphs are evolved during time (i.e., accumulating modifications performed during a
Detection using Quantitative Characteristics of CMTG
In Fig. 9, with blue continuous and dashed lines for TP-rates and FP-rates, repetitively, we cite the results exhibited by the application of Cosine Similarity deployed to measure the similarity regarding the exhibited qualitative characteristics between malicious and benign samples against known malicious samples represented by their discrete modification temporal graphs (DMTG). As we can observe through the exhibited results the increase of the number of
As we can observe from the exhibited results depicted in Fig. 9, the increase of the
In order to perform the classification procedure, we primarily conducted an investigation over our data-set, regarding the correlation between the malware families. In particular, an interesting observation results from the fact that samples that have been pre-classified to malware families with “related” names tend to have indeed an increased similarity between them. In Fig. 10 we have depicted the relation between families with at least one common block at their names definitions. In order to perform a prior evaluation of our model we distinguished three type of classification, namely: Exact, Direct, and Partial Matching. Next we present the results exhibited through two series of experiments utilizing both the discrete modification temporal graphs and the cumulative modification temporal graphs in order to represent the evolution of a GrG graphs over time, and into a further extent leverage the characteristics of these type of temporal graphs in malware classification. The procedure followed for the indexing of software samples that have been previously detected as malicious to known malware families, is the one described in Section 4.3, see Fig. 5. Each of these classification types refers to how we attest the name of the family that our model classified our sample, and are define as follows:

Malware families connected by their names.
In Table 2, we present an example of our classification evaluation procedure through a simple example explaining how these classification accuracy metrics work. Let as assume that we have a sample from family “Bancos, Banker” that has been detected as malware and we classify it into a malware family that is presented in the first column. So, in columns 2, 3 and 4 we can observe what would be the result (1 for correct or 0 for wrong classification) according to if we demand exact, direct, and partial matching.
Classification example utilizing the exact, the direct, and the partial matching

From (a) to (d) we depict the indexing of detected malware samples to known malware families for various
In the first series of experiments we utilize the discrete modification temporal graphs to represent the evolution of a GrG graphs over time, and the deployment of similarity metrics that investigate the relational, the qualitative, and the quantitative, characteristics, in order to measure the classification ability of our proposed model. Throughout this series of experiments we investigate the classification rates exhibited by the our model over different types of correct classifications, (i.e., Exact, Direct, and Partial Matching) as described above in order to prove the potentials of our model in the indexing of software samples.
In Fig. 11 we present the classification results exhibited through a series of experiments utilizing the discrete modification temporal graphs alongside the Jaccard Similarity, the
On the other hand, as we can observe from Fig. 11(a) the utilization of relational characteristics performs adequately for the types of Direct and Partial Matching especially for lower values of
Classifying detected samples utilizing CMTG
In the second series of experiments, similarly we utilize the cumulative modification temporal graphs to represent the evolution of a GrG graphs over time, and the deployment of similarity metrics that investigate the relational, the qualitative, and the quantitative, characteristics, in order to measure the classification ability of our proposed model. Correspondingly, our main goal is to investigate the classification rates exhibited by the our model over different types of correct classifications, (i.e., Exact, Direct, and Partial Matching) as described above in order to prove the potentials of our model in the indexing of software samples.
The classification results exhibited through the second series of experiments, utilizing the cumulative modification temporal graphs alongside the Jaccard Similarity, the

From (a) to (d) we depict the indexing of detected malware samples to known malware families for various
On the other hand, as we can observe from Fig. 12 (a)–(d), an interesting observation comes to light, when it results that the increase on the number of
In this work we designed and developed a graph-based framework for malware detection and classification. The core component of our work are the temporal graphs that depict the structural evolution of a graph through time. The object we operate on, are the Group Relation Graphs (GrG graphs), that are constructed after grouping specific disjoint vertices and the corresponding edges by the system-calls invoked during the execution time of a software sample. In our approach we distinguish two types of temporal graphs, namely, the discrete modification temporal graphs and the cumulative modification temporal graphs. In the first type of temporal graphs, i.e., the discrete modification temporal graphs, the structural modifications of a GrG graphs through time (i.e.,
Discussion
Through the evaluation of our model we performed two distinct series of experiments regarding the utilization of the two types of temporal graphs, namely, the discrete modification temporal graphs and the cumulative modification temporal graphs, regarding their potentials in detecting and classifying malicious samples. Throughout the experimental study for the evaluation of our proposed framework, we investigated three types of graph characteristics that could be utilized by the corresponding similarity metrics in order to measure the similarity between a test and any known malicious sample. Namely, commonalities exhibited by the relational, the qualitative and the quantitative characteristics between temporal graphs are measure by the Jaccard Similarity, the Δ-Similarity, and the Cosine Similarity, respectively, in order to deduce whether a software sample under consideration is malicious or benign, and to a further extent to classify it to a family of known malware samples.
The experimental results exhibited over the evaluation of detection and classification ability of our model utilizing the discrete modification of temporal graphs showed that the performance of our framework on detection and classifying malicious samples performs adequately well. Observing the corresponding results it is figured out that the framework utilizes efficiently the commonalities exhibited over the various types of graph characteristics where as we can see a further increase on the number of
On the other hand, the experimental results exhibited over the evaluation of detection and classification ability of our model utilizing the cumulative modification of temporal graphs showed that the performance of our framework on detection and classifying malicious samples performs in very high rates exhibiting high detection rates followed by very low false positive rates, alongside with high classification rates. Throughout the corresponding results it is clear to see that the commonalities over all the types of graph-characteristics perform equally well regarding the detection and classification potentials exhibited by our proposed framework. The experimental results show in contrast that the number of
As we depict in Fig. 13, there is a straight distinction among the detection results exhibited over the utilization of discrete modification and the cumulative modification temporal graphs alongside the underlying graph-characteristics the commonalities of them computed by the corresponding similarity metrics. In Fig. 13, we present the detection results exhibited over the deducted series of experiments and evaluate the potentials of each case by measuring the maximum difference presented among the TP and the FP rates across a number of

Comparison of detection results utilizing different similarity metrics over DMTG and CMTG graphs.
Observing these results it is easy to see that there is a set combinations of similarity metrics and underlying temporal graphs that over the point of 32
On the other hand, the similarity metrics deployed on top of the cumulative modification temporal graphs, even for greater numbers of
In order to attest the potentials of the detection ability of our proposed detection model we measure its effectiveness on detection malicious samples consigning four measurements, namely the precision, the recall, its accuracy and the F-measure, which their computation is listed below:
Throughout our evaluation experiments regarding the detection potentials of our proposed model, for the case of the utilization of Discrete modification Temporal Graphs (DMTG) to represent the structural evolution of a Group Relation Graphs (GrG) graph over time, the corresponding four factors that prove the detection potentials of our model are maximized for the case of the deployment of Jaccard similarity metric partitioning the graph into 4
In Table 3 we cite the results from relative approaches concerning the reported measurements on the detection rates (%) on the exhibited Precision, Recall, Accuracy and F-Measure, where they are reported. The proposed approach, when compared to the baseline where the GrG graphs are deployed without the representation of their structural evolution during time [45] it is observable that only in the Precision measurement the proposed approach exhibits results less than the ones achieved by the baseline, where in the other three factors, i.e., Recall, Accuracy, and F-Measure, the proposed approach utilizing either the DMTG or the CMTG outperforms the approach where the Temporal Graphs are not deployed. Moreover, for the case of the utilization of CMTG for
Detection rates (%) exhibited on the factors of precision, recall, accuracy, and F-measure, by relative works
Regarding the comparison of the results exhibited by our proposed method it is observable that the potentials depicted by the achieved detection rates show that our technique is comparable to the ones proposed during the latest years, i.e., the utilization of Fuzzy and Fast Fuzzy Pattern Tree [11], API Call Sequence Alignment [26], API Call Sequences [56], and System-call Dependency Sequences [30], where our proposed approach exhibits better results against some of them, with the System-call Dependency Sequence method proposed in [30] exhibiting the maximum values regarding the Accuracy, F-Measure, and Precision–Recall measurements.
Classification accuracy (%) exhibited by relative works
Similarly, throughout our evaluation experiments regarding the classification potentials of our proposed model, for the case of the utilization of Discrete modification Temporal Graphs (DMTG) to represent the structural evolution of a Group Relation Graphs (GrG) graph over time we verified the classification ability of our model comparing it with the ones achieved by relative approaches. For the case of the deployment of Cossine similarity metric partitioning the evolution of the GrG graph into 4
Several modeling alternates have been arise during the theoretical construction of our graph-based proposed model regarding the temporal evolution of behavioral graphs that represent software samples, regarding their structural modification during time. Our approaches that we discuss briefly next, mostly concern the representation of the structural modifications on the GrG graphs during time, and how they could also be represented with other structures that do not cooperate graphs, and consequently deserve the application of different manipulation methods.
In the first alternate approach, we could denote the structural evolution of a given by plotting by a discrete distribution of the addition of edges over the graph on specific time buckets and create patterns that could be utilized in order to perform pattern-matching over the plot of any given pair of samples. These plots should be construct for the temporal evolution of each corresponding edge pair of two given graphs in order for the patterns to be comparable. On the other hand, in the second approach of our model, we need to simulate the structural modification of a given graph during time. Similarly to our approach, rather than constructing several graph instances equal to the number of the defined
Regarding the limitations of our model, the main issue encountered regarding the implementation design concerns the spatial complexity of our approach. More precisely, defining a fine-grained or a coarse-grained quantization of time would affect to a great extent the space required to store the corresponding Temporal Graph instances. As easily someone can understand, an implementation of our proposed model on a fine-grained time quantization scheme, would be more precise against a more coarse-grained once. Additionally, further tuning issues arise over the trade-off between the precision on temporal structural modifications and the construction of more distinguishing patterns. However, more sophisticated approaches, such an implementation that utilizes the maximum length of a binary tree in order to bound the quantization would lead to a more stable, rational, effective and efficient approach.
Further research
In the context of extending our work we set our future research aims over the investigation of comparison of temporal graphs, and mainly focusing on the
Footnotes
Acknowledgment
This research is co-financed by Greece and the European Union (European Social Fund- ESF) through the Operational Programme “Human Resources Development, Education and Lifelong Learning 2014- 2020” in the context of the project “Malicious Software Detection and Classification utilizing Temporal–Graphs of Discrete and Cumulative Structural Evolution” (MIS 5047642).
