Abstract
The European Union (EU) provides a high amount of its budget for the Common Agricultural Policy (CAP). Overall, the distribution of the funding adheres to several regulations derived either from EU or from Member States laws, while various departments and business roles are involved throughout the process. The analysis of such business processes can take advantage of social network analysis. The research direction of data-driven social network analysis had been primarily identified by various research works as soon as the companies and organizations started to turn out to the adoption of advanced information systems; however, this potential has not been exploited in the last years, and especially from a process mining perspective. In this paper, we apply an integrated process mining and social network analysis approach in order to extract the optimal process models and the social network of the entities involved in the process aiming at tackling with the complexity of the CAP funding process and providing useful insights from an organizational perspective. To do this, we apply a methodology that consists of four steps: (i) import event log and exploratory data analysis; (ii) process discovery; (iii) conformance checking and evaluation; and, (iv) social network analysis. The results show that the proposed approach can reveal insights that are not visible due to the complexity of such processes.
Keywords
Introduction
The European Union (EU) provides a high amount of its budget for the Common Agricultural Policy (CAP). 1 Most of these expenditures target the funding of the farmers without considering prerequisites in terms of their production. The rest of these expenditures are directed to rural and market development. Overall, the distribution of the funding adheres to several regulations derived either from EU or from state-members laws, while various departments and business roles are involved throughout the process. The countries are obliged to use an Integrated Administration and Control System (IACS) in order to deal with the complexity of the process.2,3
Extracting the information related to this business process based upon the stored event logs cannot be performed using typical data mining approaches. Classical data mining methods are not able to extract the process model, while they usually focus on the analysis of a specific step of the process. Classical data mining methods, such as clustering, classification, and association rule mining, often fail to capture the sequential dependencies and dynamic nature of administrative processes like CAP funding. These methods primarily focus on pattern discovery from static datasets but lack the capability to model the temporal and causal relationships inherent in process execution. Process mining, in contrast, enables the reconstruction of event sequences, identification of process deviations, and discovery of inefficiencies by leveraging structured event logs. This process-centric approach is particularly suited for CAP funding administration, where compliance with regulations, transparency, and efficiency are paramount. Process mining is a data-driven approach targeted to event log datasets, which puts the process at the core of the analysis in order to discover, monitor, and improve the processes at hand. 4
However, due to the inherent complexity of today's business processes, this approach tends to create the so-called “spaghetti-like” process models that are rarely understandable by the domain experts, while they also limit the visibility, since they do not demonstrate the connections and relationships among the various departments involved in the business process. 5 Therefore, the analysis of such business processes can take advantage of social network analysis, which aims at providing understanding about the dependencies among social entities in the data, as well as identifying their behaviors and their impact on the social network. 6 The research direction of data-driven social network analysis had been primarily identified by various research works as soon as the companies and organizations started to turn out to the adoption of advanced information systems.7–9 However, this potential has not been exploited in the last years, and especially from a process mining perspective.
In this paper, we apply an integrated process mining and social network analysis approach in order to discover the CAP funding process and identify variations, such as loops, delays, bottlenecks, etc. The proposed approach extracts not only the optimal process models but also the social network of the involved entities in the process. In this way, it is able to tackle with the complexity of the CAP funding process and to provide useful insights from an organizational perspective.
The rest of the paper is organized as follows. Section 2 presents the theoretical background on process mining and social network analysis. Section 3 provides the main steps of the adopted research methodology. Section 4 describes the CAP funding process and the respective event log extracted from the IACS. Section 5 demonstrates the analysis and the results. Section 6 concludes the paper and outlines our plans for future work.
Theoretical background
In this Section, we present the theoretical background on process mining (Section 2.1) and on social network analysis (Section 2.2), as well as their potential to tackle with complex public administration processes like the CAP funding process.
Process mining
Process mining aims at analyzing data in the form of event logs, that are largely stored in enterprise information systems, searching for hidden patterns and correlations, and extracting insights about business processes. 4 In this way, it can improve the business processes by detecting bottlenecks, delays, inefficiencies, etc. 10
Different process discovery techniques influence model interpretability and applicability. Directly-Follows Graphs (DFG) provide a straightforward representation of activity sequences but often lead to overly complex or “spaghetti-like” models. Petri nets offer a more formal representation with explicit token-based execution semantics, making them suitable for conformance checking but challenging to interpret for non-experts. BPMN provides a more intuitive, business-friendly visualization, aiding communication between technical and non-technical stakeholders. The choice of representation impacts the clarity, precision, and usability of discovered models in CAP funding analysis.
In the last years, with the increasing amount of available data and the emergence of sophisticated information systems, process mining has been gathering an increasing research attention.11–13 These research works can be broadly distinguished into four main categories 14 : development of new algorithms, extension of existing algorithms, evaluation of the scalability of process mining algorithms, and developing data preprocessing pipelines on the event logs. Overall, the focus has been centered on the computer science perspective rather than on the business management perspective. 14 In addition, lots of research works focus on healthcare, logistics, and banking applications, while the public administration domain remains unexplored despite its high potential with regards to process mining.14,15 However, process mining is a driver for the digital transformation of the public sector and to the support of informed decision making by the process owners.16,17
Process mining techniques, such as process discovery, conformance checking, and performance analysis, play a key role in identifying inefficiencies and bottlenecks within CAP funding administration. Process discovery techniques provide a visual representation of process flows, highlighting redundant activities and delays. Conformance checking detects deviations from standard procedures, identifying non-compliant cases and potential fraud. Conformance checking is critical in evaluating process adherence to predefined rules and identifying deviations. Token-based replay helps assess whether observed process executions align with the expected model by tracking token movements. Alignment-based techniques quantify deviations by mapping real execution traces to the reference model, providing actionable insights for process improvement. Footprints analysis offers a high-level summary of process variations. In the CAP funding administration, conformance checking can detect fraudulent applications, highlight compliance issues, and improve governance by ensuring that administrative workflows follow regulatory guidelines. Performance mining techniques, including throughput analysis and waiting time detection, help pinpoint process inefficiencies. However, challenges such as noisy event logs, incomplete data, and variations in process execution can impact the accuracy of insights derived from process mining.
Public administration processes have some distinct characteristics that make their analysis a challenging task; their complexity and lack of flexibility lay on the strict regulations and the fact that many people are involved in various stages of the process.16,18 Despite its potential, process mining remains underexplored in public administration due to several barriers. Administrative processes often involve heterogeneous IT systems, making data extraction and integration complex. Privacy regulations and data security concerns further limit access to event logs needed for process mining. Additionally, public sector organizations may lack the technical expertise required to apply process mining effectively. These challenges are particularly present in EU administration processes, such as the CAP funding. This complexity cannot be tackled by the typical process discovery algorithms, such as Heuristic Miner and Inductive Miner, because they tend to create “spaghetti-like” process models that are rarely understandable by the domain experts.5,19 Moreover, such algorithms, even if understood, they hinder the complete visibility and interpetability of the process, since they do not reveal the connections and relationships among the various departments involved in the business process. 20 Therefore, analysis of public administration processes can take advantage of social network analysis.
Social network analysis
In general, social network analysis aims at providing understanding about the dependencies among social entities in the data, as well as identifying their behaviors and their impact on the social network. 6 In this sense, the power does not belong anymore to states, institutions, or large corporations, but to the networks that structure the society. 21 Therefore, social network analysis identifies not only the networks, but also the participants, thus providing the potential of measuring, monitoring, and evaluating the information flows in order to enhance the organizational performance. 21 Research on social networks has focused on how the outputs are affected by both the nodes’ attributes (e.g., individuals) and the arcs, representing the relationships, that interrelate the nodes among them. 22 The nodes represent the “actors”, and can be people, teams, organizations or information systems. In this context, the arcs provide the connections among the actors and differ from each other in terms of content, direction, and relational strength. 23 These characteristics are important because they affect the dynamics of the network. 6
The acceleration of digital transformation has led to a wide range of information systems in organizations in order to facilitate and automate the business processes. These information systems store large datasets as event logs that could represent the actual business process execution, but also the respective social networks. 22 The research direction of data-driven social network analysis based upon the event logs stored in enterprise information systems had been primarily identified by various research works as soon as the companies and organizations started to turn out to the adoption of advanced information systems.7–9 However, this potential has not been exploited in the last years, and especially from a process mining perspective due to several factors. First, many process mining studies focus on control-flow analysis rather than human-centric interactions. Second, SNA requires well-structured logs capturing user interactions, which are often missing or fragmented in administrative databases. Additionally, concerns regarding data privacy and ethical considerations limit the widespread adoption of SNA in analyzing sensitive government processes.
Research methodology
In this Section, we present the adopted research methodology for the integrated process mining and social network analysis on the CAP funding process. The methodology consists of the following steps: (i) Import Event Log and Exploratory Data Analysis (Section 3.1); (ii) Process Discovery (Section 3.2); (iii) Conformance Checking and Evaluation (Section 3.3); and, (iv) Social Network Analysis (Section 3.4).
Import event log and exploratory data analysis
The event log is extracted from the information system. Event logs contain structured records of activities executed within a business process. Each event typically includes a case ID (identifying the process instance), an activity name, a timestamp, and optional attributes such as resource, lifecycle status, or additional contextual data. Figure 1 represents an example event log structure. High-quality event logs should be complete, consistent, and accurately reflect the underlying process execution. These logs enable the reconstruction and analysis of process flows, facilitating insights into performance, compliance, and optimization opportunities. Typically, event logs are represented in the XES (eXtensible Event Stream) format, according to the IEEE XES standard which is capable of storing a high amount of information regarding process properties. 24 The eXtensible Event Stream (XES) format is a standardized, XML-based format for storing and exchanging event logs in process mining. It supports rich metadata through extensions, enabling the representation of event attributes such as timestamps, activity names, case identifiers, and resources. XES ensures interoperability between process mining tools and facilitates advanced analysis by preserving event log structure, context, and semantics. Figure 2 depicts an example event log structure in XES format.

An example event log structure an example event log structure.

An example event log structure in XES format.
Process discovery extracts the process model based on the event log under examination, while, at the same time, tackles with noisy, incorrect, or incomplete data.25–27 In order to initially enable a more understandable model to be extracted, a preprocessing was done on the data so that the process is presented at a high level, i.e., at document level. Afterwards, different models and their properties are extracted, as well as a first evaluation of them at a high level according to their characteristics.
Process discovery may lead to three main process model representations:
Directly-Follows Graph (DFG). It is the simplest representation of the process models. The nodes correspond to the activities and the arcs represent the relationships among the activities. It also includes a source and sink corresponding to the start and the end activity respectively. When two activities are interconnected through an arc, this means that, in the event log under examination, the source activity is directly followed by the sink activity.
28
Petri net. Petri nets offer a higher-level representation of the process models and are capable of representing effectively the concurrent behaviour in processes. Therefore, they can handle the complexity within the processes by demonstrating various types of transformations, as well as sequential, parallel, choice, and loop execution among the activities included in the process under examination.
29
Business Process Model and Notation (BPMN): The BPMN 2.0 standard builds compact and sound process models by also incorporating subprocesses, data flows, and resources.
30
Conformance checking and evaluation
Conformance checking performs a comparison between the discovered process models and the observed behavior recorded. With conformance checking, the different techniques will be applied so that the discrepancies that will be discovered will be analyzed from two points of view, the case of an error in the model or the case of the wrong execution of the process. 31 The main questions that arise are, if the model is not correct and does not correspond to reality, how can it be improved? If the process is wrong and some cases deviate from the model and require corrective actions, how is the process flow improved to achieve better compliance? We use three conformance checking methods: “token-based replay”, “alignments”, and “footprints”. The Token-Replay technique shows, for a use case, whether the particular trace matches and the number of tokens that are missing, produced, consumed or remaining active in PetriNet. The alignment techniqueshows the cost of the alignment, whether it matches or not, and a series of statistics on the number of steps it took or the number of states used. Finally, the footprints are presented through corresponding diagnostics, images of the results of the footprint calculation that the model represents and a series of statistics that serve to draw conclusions.
In addition, in this step, the process models discovered are evaluated in terms of four criteria:
32
fitness, generalization, simplicity, and precision.
Replay Fitness. It measures to what extent the discovered model is able to reproduce the process instances (cases) existing in the event log. Simplicity. It measures to what extent the discovered process model has tackled with the complexity and is understandable by humans Precision. It measures to what extent the discovered process model covers behaviours that have not been seen before in the event log. Generalization. It measures to what extent the discovered model will be able to reproduce future behavior of the process.
Social network analysis
Social network analysis focuses on the relationships among people, information systems and processes, that is data that concerns the social aspect of the analysis. Interpreting process mining results can be challenging due to complex process variations, incomplete event logs, and the presence of parallel workflows. Traditional process models often fail to provide insights into human interactions, collaboration patterns, and decision-making dynamics. Social network analysis complements process mining by revealing how actors interact within the CAP funding process, identifying influential stakeholders, and uncovering coordination bottlenecks. By analyzing communication structures and workflow dependencies, Sovial network analysis enhances the interpretability of process mining results, aiding public administrators in optimizing resource allocation and streamlining procedures.
The purpose is initially the discovery of the resources that participate in the process and the creation of worksheets and activities. A social network will be discovered from event data that has the ability to store human behavior and contains information about who actually has performed the activities of the process. The Resource Activity Matrix is constructed to discover the time spent by a resource on an activity. Then the Social Network of the process completes the table of activities and presents the cooperative relations of the operators. Related metrics such as the degree of centrality, the total degree of the network, the closeness centrality, the betweeness centrality and other calculations based on graph theory are calculated.
The CAP funding process
In this Section, we briefly describe the CAP funding process. The EU provides a high amount of its budget for the CAP. 1 Most of these expenditures target the funding of the farmers without considering prerequisites in terms of their production. The rest of these expenditures are directed to rural and market development. Overall, the distribution of the funding adheres to several regulations derived either from EU or from Member States laws, while various departments and business roles are involved throughout the process. The countries are obliged to use an IACS in order to deal with the complexity of the process.2,3 Manual verifications and external approvals introduce additional complexity to the CAP funding process, often leading to delays. Land registry verifications ensure that agricultural land complies with eligibility criteria, while field engineer assessments validate on-ground conditions. These checks, while essential for preventing fraud and ensuring fair fund distribution, create bottlenecks due to dependency on external entities and resource constraints. Process mining can help quantify the impact of these manual steps, identifying areas where automation or better coordination could improve efficiency without compromising compliance.
The IACS serves as a fundamental mechanism for ensuring the effective and transparent allocation of CAP funds across EU member states. It provides a structured framework for processing applications, verifying eligibility, and conducting compliance checks through multiple subsystems, including the Land Parcel Identification System (LPIS) and on-site inspections. IACS plays a critical role in preventing fraud, reducing administrative burden, and ensuring equitable fund distribution by integrating automated and manual validation steps. In the context of process mining, analyzing IACS event logs can help identify inefficiencies, optimize workflows, and enhance decision-making in fund allocation and monitoring.
The process under examination deals with the processing of applications for EU direct payments to German farmers from the European Agricultural Guarantee Fund. The process is repeated every year with some differences because of changes in EU regulations. 33 The process leads to a decision about the approval or rejection of the application and the final payment or not of part or all of the amount of compensation that the farmer is entitled. The dataset is derived from the information system that facilitates these processes within federal ministries of agriculture and local departments. These workflows operate through a document-based structure, where each document exists in a specific state that dictates permissible actions. These actions can either be carried out manually at any time using dedicated document tools or scheduled for automatic execution. In the latter case, automation may be explicitly recorded in the log or inferred when a single user performs a high volume of actions within a short timeframe, indicative of batch processing.
In addition to the difficulties related to the regulations, the process of direct payments has a multitude of activities in which, in addition to the information systems, the human factor also participates, which is unpredictable in the use of a system. Several of the activities are manual and have to do with checks involving external resources, such as approvals from the land registry for the correct declaration of parcels, visits by engineers to the field for the respective studies and assessments on parcels, making even more complicated the process. It should also be mentioned that in the member states there is also outsourcing at the level of application and handling of the process. Many farmers outsource the application work to a financial institution or a specialized company that provides such services. Moreover, the process may vary when an applicant has performed the process in the previous years comparing to an applicant that performs it for the first time. In addition, the execution of the procedure and the results may not satisfy the applicants. A normal result is considered to be the processing of the application submitted in a certain year and completed by the end of the year. This may be because the process is delayed or has to be reopened for some reason, such as the wrong filing of records, which affects the final compensation amount, something which may lead to delays. These factors lead to several variations in the process execution.
Outsourcing CAP funding application processing to financial institutions or specialized agencies introduces both advantages and challenges. While it can enhance efficiency and reduce administrative burden, it also raises concerns regarding fairness, transparency, and accountability. Process mining can help assess the impact of outsourcing by analyzing discrepancies in approval rates, processing times, and rejection patterns across different agencies. Ensuring that outsourced entities adhere to standardized procedures and maintaining oversight mechanisms through process monitoring are crucial for safeguarding equity in fund distribution.
Analysis and results
In this Section, we demonstrate the analysis of the CAP event log and the results from the implementation of the aforementioned research methodology. The structure of this Section follows the five steps of the methodology in order to demonstrate the outputs of each steps that feed into the subsequent steps.
For the needs of the work, a source code was created in the form of a Python Script, which has as input a log of events in the form of a template IEEE XES and which, then, with a specific technique, is converted into a dataframe data structure to be used subsequently by the rest of the functions. Python version 3.11 was used in OS Windows 10 and below are the basic libraries that need to be installed so that the Python Interpreter can execute the script correctly. Table 1 shows the main Python libraries used.
Basic python libraries used to build the script.
Basic python libraries used to build the script.
The total log extracted from information systems supporting these procedures at the level of federal ministries of agriculture and local departments, contains 43,809 applications directly of three-year payments from 2015 to 2017. The shortest case contains 24 events, the longest 2973 and on average there are 57 events per case referring to 14 activities. The events total 2,514,266 and represent either automatic or manual procedures that include the receipt and acceptance of the application, with the aim of completing it, i.e., approving the payment. The workflows are represented as documents, as shown in Table 2, where each document has a status that allows specific actions. Documents are listed, either explicitly in the log or clearly presented if a large number of actions were performed by the same user at approximately the same time. Some requests are repeated by specific resources and for specific reasons, either by the department (“Change” sub-procedure), or due to a legal objection by the applicant (“Objection” sub-process), while some others are subject to inspections. Finally, during the process the applications go through several steps that determine whether it is a direct payment or not and the amount that will be paid at the end to the applicant. Whether all or part of the amount will be paid can happen for various reasons, for example, if the declared size of a parcel does not match the actual size, as determined by remote or on-site inspection, or for reasons that include non-compliance with the agricultural policy or non-compliance with the conditions of one of the characteristics consistent with the applicant, for example a young farmer has different conditions than an older one. Table 3 presents the set of traces and to the set of events contained in the event log file.
Description of process documents that define the flow of the process, where each document has a status (process) that allows specific activities.
Description of process documents that define the flow of the process, where each document has a status (process) that allows specific activities.
Total traces and total events in the event log.
Figure 3 shows the distribution of 43,809 applications received in the years 2015, 2016, and 2017 and it is observed that the number of applications per year is almost constant. The month in which an application is submitted during the year is then extracted (Figure 4). Thus, it is observed that most cases start in the months of April and May, with May having the most ones.

Distribution of applications per year.

Distribution of applications by month and year.
At the level of events and activities, there is a total of 43,809 activities. Those that start the process are 4 and those that end a process are 21. Figure 5 depicts the start activities and Figure 6 depicts the end activities. According to the documentation, the initial activity is the “mail income” of the Payment Application document with 38,623 cases having started with the specific one, and the final activity is “finish payment” of the same document with 34,830 cases being completed with the specific one. The rest of the activities that have been discovered to start in a different way are due to operational errors and incomplete recording of the events by the information system.

Start activities.

End activities.
From the total log file, it can be extracted when a process started and when it ended, i.e., the time it starts based on the year and some of the activities when an application is filed and when it ends based on the year and date final activity finish payment (Table 4).
The list of total durations is retrieved for each individual case (initial information retrieval is in seconds and converted to days) and the largest, smallest and average cases of submitting an application in the total data log are calculated by year. From this point onwards, several conclusions can be drawn regarding the time required to complete the application processing process in the different years that we have as data, as it is shown in Figure 7 and in Figure 8.
In general from the description of the process and the first data, which have been extracted from the data log file, it appears that the payment program in this form came into force in 2015. In the first year that it was implemented, a total of 14,752 applications were received by the system. In the following years (2016, 2017) this number appeared slightly reduced, but not significantly. After examining the farmers, based on the applicant farmer's identification code, it was found that they vary little over the years and are essentially the same farmers. It is also established that the applications were made at the same time during the year, that is, in the months of April and May with few exceptions in the 43,809 cases recorded in the file.
Also, based on the process-level characteristics, the work allocation that each department has been calculated, as can be seen from Figure 9. Three of the departments have almost the same workload and one with fewer applications than the average of the others.
Year of initiation and completion of procedures.

Total case duration.

Duration of cases per year.

Distribution of tasks by department and year.
Table 5 presents the distribution of penalties per penalty category. The penalty categories are anonymized in the dataset. The total number of penalties decreases across the board and in conjunction with completion averages of the process, hence it is concluded that users were “trained” in how to submit the application over the years. Also, the changes made by the Joint Agricultural Development organization, i.e., the replacement of the procedures that describe the document types Parcel Document (before 2016), Department control parcels (before 2017), and Geo Parcel Document (after 2017) and in combination with the better picture of the data, show process improvement during these three years.
Distribution of penalties by the years 2015, 2016, 2017.
Table 6 presents the number of start and end activities per year. From the results obtained, specifically the average overall completion of a process, the uneven distribution of end activities per year, and the conclusion that, in several cases, there are no specific activities that determine the end of an application, it is impossible to determine the completion time of all cases. Also, the data recorded in the archive is available until January 19, 2018, it is understood that for each year we have a different period of time available. Thus, for the applications submitted in the period of April, May 2017, there is a history of approximately 9 months. On the other hand, for applications submitted in 2015, we have a history of events for more than 2 years, and it is no coincidence that there are cases in January 2018 that are pending since 2015.
Number of start and end activities per year.
The exploratory analysis of the dataset has revealed that the event log is noisy and several of the cases are not completed, as indicated by counting the different cases of execution of the process (variants) and the number of end activities. Specifically, it turns out that there are 28,457 variants of which 26,602 have been executed only once, that is the 93.48%. For this reason and to make the process more understandable, the work of Process Discovery and Conformance Checking will be done with the 10 most common cases which constitute 18.9% (5377) of the total event log file and contain 237,492 events.
Before the discovery of the process at the level of activities and due to the need to make it understandable to the process owner, a data pre-processing procedure is performed. Initially, the process was merged at the type-document level according to the start and end of some of the activities that make them up and in the mining of bi-law models using the Inductive Miner algorithm, resulting in the extraction of process models that describe the process at a high level. The results are: a BPMN model (Figure 10), and a Directly Follow Graph (DFG) (Figure 11).

BPMN model for the high-level workflow.

DFG for the high-level workflow.
These two models present the process execution flow and which documents contain activities that either run continuously, or are interrupted to continue later, since activities of other documents precede them in time. From the process map analysis (Figure 11) which was created, we notice that the documents now depict the activities, the edges the relationship between them and the numbers the transition duration from one activity to another. Analyzing in depth it is easy to see that there are many loops in the process. The most important observation regarding these loops is that usually the whole process starts and ends with the “Payment Application” activity.
Another observation is the repetition of activities. This is due to the fact that this particular process model is based only on document types and not on the activities that make them up. The analysis of the activities below shows that there are bi-reports in the number of cases that have performed the activity.
The rest of the discovery and analysis work is performed at the level of activities using process mining algorithms, in order to draw better conclusions about their execution flow. The following models present the process with different notations, exploiting the properties of each algorithm in order to present the process in an optimal way and to exhaustively analyze the process. The process discovery methods Heuristics Miner and Inductive Miner are applied.
Figure 12 depicts the PetriNet extracted with the Heuristic Miner applied to the 10 most popular instances of the process.

The PetriNet extracted with heuristic miner.
Figure 13 depicts the Heuristic Net derived from the Heuristics Miner with Dependency 0.80. The Heuristic Network covers the entire event log. The value expressing the dependence between activities has been set low, with the result that a large number of cases are excluded. Even in this way, however, the model that has been discovered is quite difficult to understand for a user who does not have appropriate knowledge, which once again confirms the complexity of the process.

Heuristic net with dependency 0.80.
Figure 14 depicts the Heuristic Net derived from the Heuristics Miner for the Top 10 variants, while Figure 15 depicts the Heuristic Net for the Top 10 variants with a Dependency of 0.80. A Heuristic Network is produced for the 10 best cases of the event log. It includes all sample instances of the entire log file, resulting in a fairly comprehensible process model. This is because all instances execute the specific process after starting a specific start activity and finishing at a specific end activity. The diagram is enriched with the performance of the process, i.e., the time it took an activity to complete. It comes to confirm the PetriNet that was produced with the corresponding data that it is indeed a WorkFlowNet. A Heuristic Net with a dependency of 0.80 has also been added, where the result is a network with fewer transitions.

Heuristic net - top 10 variants.

Heuristic net - top 10 variants - dependency 0.80.
Figure 16 depicts the PetriNet derived from the Inductive Miner using the whole event log consisting of 43,809 cases. Therefore, the produced model captures the total behavior contained in the event log, thus leading to a complex process model.

Inductive miner - PetriNet - total event log.
Figure 17 depicts the PetriNet derived from the Inductive Miner for the top 10 variants with a noise threshold of 40%. Figure 18 depicts the BPMN model for the top 10 variants. The discovered BPMN model is quite readable, the activities that make up the process flow are clear and it shows the reader exactly what is happening during its execution. Figure 19 depicts the PetriNet derived from the Inductive Miner for the top 10 variants.

Inductive miner - PetriNet - reduced noise by 40%.

BPMN model - inductive miner - top 10 variants.

Inductive miner - PetriNet - top 10 variants.
Figure 20 depicts the Reachability Graph derived from the Inductive Miner for the top 10 variants. The Reachability Graph is a system of transitions that is the result of PetriNet analysis and its basic property is a starting point and an unspecified number of endpoints. Differences and important states accessible from the initial state are distinguished. The nodes correspond to states and the edges interconnecting the nodes represent transitions that can be activated to move from one state to another. It allows the analysis of the behavior of the system, identifies the possible sequences of transitions and helps to understand the overall structure of the feasible states.

Reachability graph - inductive miner.
Figure 21 depicts the DFG for the top 10 variants annotated with frequency, while Figure 22 depicts the DFG for the top 10 variants annotated with performance.

DFG - top 10 variants - frequency annotated.

DFG - top 10 variants - performance annotated.
Conformance checking and evaluation on the PetriNet derived from the heuristics miner
The “Token Replay” method for conformance checking leads to the following results (Table 7):
The model reproduced 5377 but none of them was matched (0%). The average repeatability per case, i.e., the percentage of the case accepted by the process model, was 0.92 (92%). The overall fitness of the log is 0.92 (92%). A total of 5377 cases were reproduced with 26 places and 15 transitions. No activity was found that was not performed.
Table 8 presents the evaluation results of the PetriNet derived from the Heuristic Miner with the “Token Replay” method.
The “Alignments” method does not lead to a result due to the noise, i.e., the Petri Net is not a sound net. Therefore, this method cannot be applied.
The “Footprints” method cannot fully reproduce any case from the event log, since, among others, it includes points that are not accessible by the process as well as dead ends. Table 9 presents the evaluation results of the PetriNet derived from the Heuristic Miner with the FootPrints” method.
Conformance checking and evaluation on the PetriNet derived from the inductive miner
The “Token Replay” method for conformance checking leads to the following results (Table 10):
The model reproduced 5377 traces, all of which reproduced successfully, i.e., 1.0 (100%). The average repeatability per case, i.e., the percentage of the case accepted by the process model, is 1.0 (100%). The overall fitness of the log is also 1.0 (100%). A total of 5377 cases were reproduced with 35 places and 15 transitions. No activity was found that was not performed.
Table 11 presents the evaluation results of the PetriNet extracted by the Inductive Miner with the “Token Replay” method.
The “Alignments” method for conformance checking leads to the following results (Table 12):
The model reproduced 5377 traces, all of which reproduced successfully, i.e., 1.0 (100%). The average repeatability per case, i.e., the percentage of the case accepted by the process model, is 1.0 (100%). The overall fitness of the log is almost 1.0 (100%). A total of 5377 cases were reproduced with 35 places and 15 transitions. No activity was found that was not performed.
Table 13 presents the evaluation results of the PetriNet derived from the Inductive Miner with the “ Alignment “ method.
Conformance checking results of the PetriNet derived from the heuristic miner with the “token replay” method.
Conformance checking results of the PetriNet derived from the heuristic miner with the “token replay” method.
Evaluation results of the PetriNet derived from the heuristic miner with the “token replay” method.
Evaluation results of the PetriNet derived from the heuristic miner with the footPrints” method.
Conformance checking results of the PetriNet derived from the inductive miner with the “token replay” technique.
Evaluation results of the PetriNet derived from the inductive miner with the “token replay” method.
Conformance checking results of the PetriNet derived from the inductive miner with the “alignments “ technique.
Evaluation results of the PetriNet derived from the inductive miner with the “alignment” method.
The “Footprints” method also verifies that the process model produced by the Inductive Miner represents the actual process. Table 14 presents the evaluation results of the PetriNet derived from the Inductive Miner with the “Footprints” method.
Evaluation results of the PetriNet derived from the inductive miner with the “footprints” method.
The event log stores specific information regarding the impact of resources on the process. Based on this property, various statistics, tables and networks are calculated and mined which help us understand the process. First, the Resource-Activity Matrix is calculated (Figure 21), which shows how many times an activity was executed and which of the resources participating in the process executed it. It answers the question of what is done by whom during the execution of the process. Figure 23 shows the resources which consist of the workers, who are anonymized and are presented with an encrypted ID, and a set of information systems (Document processing automaton - DPA, Inspection service - IS, Notification automaton - NA, Parcel automaton - PA, Processing automaton - PRA, Reference alignment processor - RAP).

Activity resource matrix.
In the resource-activity matrix, the relationships between the activities and the resources is presented, however, the relationships between the resources are not presented in order to capture the cooperation between them and to visualize the flow of the process. To create a network that captures this, the “hand-over work matrix” is calculated (Figure 24), which shows how often it is delivered an activity from one department/person/system to another. Then, the corresponding social network is constructed where the nodes correspond to departments/persons/systems and the edges are the relationships between them. In addition, each edge has a corresponding weight which indicates how important it is this relationship. In the resulting graph, metrics are applied that show which resources are most involved in the execution of the process. The “Handover of Work Social Network” (Figure 25) discovered by measuring how many times a person/department/system is followed by another corresponding entity when performing a business process. By performing graph theory operations on the discovered graph, the 10 most central nodes are calculated (Figure 26). Figure 27 depicts the Top 10 nodes with the largest Degree Centrality in the Handover of Work Social Network.

Work handover matrix - top 10 variants.

“Handover of work” social network for top 10 variants.

“Handover of work” social network most central nodes.

Top 10 nodes with the largest degree centrality in handover of work social network.
Next, the “Working Together” Social Network for the top 10 variants is presented (Figure 28) which counts the number of times two entities cooperate to solve an activity in one instance of cooperation. It can be seen from the measurements of the graph that at the center of the process are the information systems used by the users as well as the people who handle the process. The above measurements are shown by the calculations in Figure 29 and the weights of the edges of the graph in Figure 30. Figure 31 presents the organizational perspective of the process, i.e., the grouping of resources into roles. Roles are derived based on the activities performed by the resources.

“Working together” social network for top 10 variants.

Top 10 nodes with the largest degree centrality in “working together” social network.

“Working together” nodes with high centrality.

Roles in activities.
The EU provides a high amount of its budget for the CAP. Most of these expenditures target the funding of the farmers without considering prerequisites in terms of their production. Overall, the distribution of the funding adheres to several regulations derived either from EU or from Member States laws, while various departments and business roles are involved throughout the process. Extracting the information related to this business process based upon the stored event logs cannot be performed using typical data mining approaches. Instead, this objective can benefit from process mining in order to discover, monitor, and improve the processes at hand. However, due to the inherent complexity of today's business processes, this approach tends to create the so-called “spaghetti-like” process models. Therefore, the analysis of such business processes can take advantage of social network analysis.
In this paper, we apply an integrated process mining and social network analysis approach in order to discover the CAP funding business process and identify variations, such as loops, delays, bottlenecks, etc. The proposed approach extracts not only the optimal process models but also the social network of the involved entities in the process. In this way, it is able to tackle with the complexity of the CAP funding process and to provide useful insights from an organizational perspective.
Overall, the funding application process is complex in nature. This happens because of the complex conceptual framework within which it operates, that is to say that it must simultaneously comply with the rules established by the European Union and the laws of each individual state. Furthermore, the process is hybrid, as it involves IT systems, the users of the systems, external partners who do the checks and the beneficiaries of the compensation themselves. Therefore, the human factor that participates in the process is unpredictable and comes to confirm the theory of modeling a process, that is, that the inability to record the human behavior because it is influenced by various factors such as individual preferences, cognitive parameters, social dynamics and environmental context. These elements can introduce a level of unpredictability in behavior and complexity that is difficult to capture through mathematical models alone. It is important to mention that for the 43,809 cases, the process had 28,457 different variations of which 26,602 have been executed uniquely, which makes it difficult to analyze the process to detect deviations or delays in its execution.
Also, by analyzing the set of data and initially taking the start times, the completion times but also the description of the process, the paper concludes that it is a process that changed over time. The process was performed in all three years for which we are provided with historical data by almost the same individuals. This has had the effect over time of applicants and those involved in the process adapting, learning how to operate the systems and making subjective judgements, which adds experience and running the process in such a way that the monetary benefit is longer and can be completed in less time and with fewer activities. Furthermore, the improvement of the process is due to the fact that the body that determines the process itself identified problems in the implementation and made specific changes.
From the analysis done, additional conclusions were drawn about the quality of the data present in the event log and from these process discovery emerged. Initially it had some minor errors that were corrected during pre-processing, such as wrong year-level dates (starting from 2013 and 2014). But an important issue that also affected the discovery of process models is that the activities, with which the discovery algorithms processes output the corresponding model, were often not distinct. For example the ‘save’ activity is a member of both the “Geo Parcel Document” and the “Reference alignment”. For this reason, the need arose to present the process at the document level. Also, there was the thought of additional data pre-processing to make the activities distinct but it would change the dataset too much and therefore, no safe conclusions would be reached for the analysis work.
The results highlight the potential of integrating social network analysis within process mining to enhance transparency, collaboration, and efficiency in public administration. By uncovering interaction patterns and identifying key actors within administrative processes, social network analysis enables data-driven decision-making, workload balancing, and the detection of bottlenecks or inefficiencies. This approach supports evidence-based policy implementation and improves service delivery by fostering accountability and optimizing resource allocation. However, practitioners must consider data privacy, ethical concerns, and organizational resistance when applying social network analysis in public administration contexts.
In our future work, we plan to investigate ML, deep learning and Automated Machine Learning (AutoML) algorithm in the context of a predictive business process monitoring approach aiming at enhancing the aforementioned approach with predictive capabilities. In this way, we will be also able to provide predictions about the next activities and the remaining time at runtime of a business process instance. Consequently, corrective actions will be able to be taken during a business process instance running, thus reducing the time needed for completing CAP applications.
Footnotes
Acknowledgements
This work was supported in part by the Postgraduate Programme in Artificial Intelligence and Visual Compu-ting (AIVC), organized by the University of West Attica (Greece) in cooperation with University of Limoges (France).
Funding
This work was supported in part by the Postgraduate Programme in Artificial Intelligence and Visual Computing (AIVC), organized by the University of West Attica (Greece) in cooperation with University of Limoges (France).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
