Machine learning application in the ex-combatant demobilization process on the Colombian armed conflict

Abstract

This research explores the potential of supervised machine learning models to support the decision-making process in demobilizing ex-combatants in the peace process in Colombia. Recent works apply machine learning in analyzing crime and national security; however, there are no previous studies in the specific contexts of demobilization in an armed conflict. Therefore, the present paper makes a significant contribution by training and evaluating four machine learning models, using a database composed of 52,139 individuals and 21 variables. From the obtained results, it was possible to conclude that the XGBoost algorithm is the most suitable for predicting the future status of an ex-combatant. The XGBoost presented an AUC score of 0.964 in the cross-validation stage and an AUC of 0.952 in the test stage, evidencing the high reliability of the model.

Keywords

Demobilization machine learning colombia supervised learning classification

1. Introduction

Colombia has suffered an armed conflict for more than fifty years, bringing negative consequences on social and economic development. As an alternative to the use of weapons, since 1982 a demobilization process has been developed. The illegal combatant is motivated to leave the insurgent group for economic, social, and judicial benefits [1]. Official data show that between 2001 and 2020, seventy-five thousand seven hundred and thirty-one (75,731) people (85% men and 15% women) from armed groups outside the law embarked on the path of return to civilian life [2]. Demobilization as a peace strategy has been evidenced as a positive element in armed conflicts; for example, the research of Ribetti [3] illustrates how disengagement can be an effective strategy for security only under specific circumstances. Consequently, in the research of de Posada [4] identifies that six first-order factors explained demobilization: survival, physical-psychological safety, civilian safety, justice, self-determination, and belongingness. The article of Nussio and Howe [5] analyze Post-Demobilization Trajectories of Violence, exploring the causes that explain why rates of violence can rapidly increase in a post-demobilization context.

As a result, to the degree that the demobilization process is guided by an advanced decision-making framework based on objective evidence, it will foster public confidence in the existence of a sound strategy for consolidating the peace process. The goal will be achieved by reducing the drop-off percentage in the process and identifying the critical aspects to succeed in the demobilization, considering the characteristics of each combatant. The primary objective of this research is to investigate beyond the fitting of machine learning models. Thus, our proposal’s success is not contingent on accurately anticipating when an ex-combatant will exit the demobilization process. Rather than that, it aims to improve the decision-making on the acceptance and allocation of benefits received by a demobilized individual, thus maximizing success in reintegrating into civilian life.

The dataset comprises the statistics published by the Agency for Reincorporation and Normalization (ARN), shows data on demobilization, location, year of entry into the process, benefits of economic insertion, benefits of academic training and training for work, economic occupation, family census, social service actions, among others. The dataset has 55,600 participants in demobilization processes between 2001 and 2019.

In this paper, a supervised machine learning approach is proposed to predict ex-combatants status in the Colombian demobilization process. For the creation of the model, the Random Forest, GLMNET, KNN and XGboost algorithms will be implemented. In addition, a cross-validation procedure will evidence the algorithm with the best performance.

The remainder of the paper is organized as follows. In Section 2, a description of the demobilization process In Colombia is presented and a presentation about relevant related works. In Section 3, it describes the dataset and the machine learning approach to analyze data. Section 4 describes the machine learning models implemented and the performance metrics of the evaluation. Section 5 presents the discussion of the paper. The conclusions are in Section 6, and finally, in Section 7, the conclusions.

2. Background and context

The ARN defines the Disarmament, Demobilization and Reincorporation (DDR) process as one that contributes to security and stability in a territory after emerging from a situation of organized violence by disarming combatants and removing them from military structures. The DDR is the one who provides them with the necessary tools to reintegrate socially and economically into civil society.

The three components of the DDR process in Colombia are closely correlated since if disarmament and demobilization are carried out effectively, security conditions in the local and national context will improve, a scenario that facilitates the reintegration of ex-combatants. Consequently, Mouly et al. [6] consider that the disarmament and demobilization process, which usually occurs after a period in which combatants gather in safe areas waiting to return to civilian life, is when the time for reintegration arrives.

According to the Colombian Agency for Reintegration, there are two distinct methods of combatant demobilization, i) the collective one, which implies a prior negotiation with the national government and the group that intends to lay down its arms. This demobilization obeys the order given by the leaders of the respective structures and not necessarily the will of each combatant, ii) in a individual demobilization, the combatant makes the voluntary decision to leave the group to which he belongs.

The regulation of the DDR process is established in Decrees 128 of 2003 and 395 of 2007, indicating that persons demobilized under the framework of agreements with armed organizations outside the law or individually may benefit from the socioeconomic reintegration programs established by the National Government. Thus, providing demobilized individuals with procedures that enable them to create a life project in a secure and dignified manner. Figure 1 presents the route and steps followed in the reintegration process.

Figure 1.

Demobilization route.

Beyond disarmament and demobilization, the reintegration process represents a challenge for society and is considered the most problematic stage of the peace process; Because reintegration is an ongoing process that takes place mostly on a local level and through which demobilized personnel obtain civilian status. Thus, reintegration is part of the overall development of a country and constitutes a national responsibility that can be complemented with international support. Therefore, Montoya and Herrera [7] state that an ex-combatant can comply with disarmament and demobilization. However, by not reaching the desired final state with reintegration into civilian life, he is prone to return to criminal activities.

Because reintegration reverses such importance as the final stage of the process, the accuracy of forecasts is of great importance to conclude the DDR effectively. Since the phenomenon’s complexity is given by many demobilized people and the absence of reliable historical information to implement differentiated actions to achieve a better adaptation of people to civilian life and therefore ensure the sustainability of the peace process in the future. The reintegration of ex-combatants into the law is one of the pillars of the peace process. It means the consolidation of a public policy to guarantee and promote demobilized people’s social and economic development, aiming to have no incentive to take up arms again. In this way, the primary purpose of the reintegration process is to provide a series of economic aids to generate a self-sustainable life through the creation of companies, education, and training in legal businesses. Thus, a predictive model provides the tools and techniques necessary to manage and process social data about demobilized people. So, handling the complexity of a process with no background and providing a support scheme for strategic and operational decision-making that positively affects people’s integration into civilian life.

Regarding the international experience in cases of reintegration processes, Sacristán [8] mentions those developed in Africa, Asia, and Central America. This author states that reintegration initiatives failed in South Africa and Burundi due to non-compliance with incentives or the lack of technical support to demobilized persons. In countries such as Indonesia, El Salvador, and Nicaragua, some initiatives such as access to land were adopted; however, the experience showed that land delivery requires access to a comprehensive financial aid package. For its part, El Salvador stands out with success in the reintegration process by involving non-ex-combatant communities, who contributed legitimacy to the process.

The criterion for access to the demobilization process relies on three requirements. i) having belonged to an armed organization outside the law, ii) To have the will to rejoin civilian life, and (iii) not having committed crimes against humanity. Due to the lack of theoretical references for contexts such as the Colombian case, the machine learning model does not pretend to be a biased factor for acceptance on the process; instead, it seeks to alert decision-makers about the probability of dropping out of an ex-combatant.

The information provided by the ARN refers to the 76,067 people who left the ranks of armed groups outside the law in Colombia between 2001 and 2021, of which 51,586 people voluntarily entered the reintegration process. That is to say that 68% of this population decided to return to civilian life, accessing the different benefits granted by the state. However, only 26,091 people have completed the reintegration process into civilian life, and 3,867 are currently completing their reintegration route, suggesting that 21,628 people (42%) are out of the process or absent. Therefore, the forecasting methods derived from machine learning will provide decision-makers in reintegration processes, more accurate, transparent, and good behavioural predictions that allow corrections to be made and help to conclude the DDR process effectively.

2.1 Machine learning in social sciences

Machine learning models applied to the social sciences have become an exciting challenge for researchers, given that the particularities of social problems are somewhat different from those of industry [9]. While the industrial and technological sectors aim to boost productivity and competitiveness via the integration of information systems [10]. The social sciences are concerned with integrating data to generate decisions that positively influence people’s lives [11]. For example, forecasting the state of food security [12], predicting and improving the performance of athletes in competitions [13], predicting student attrition using administrative data [14], predicting future cases of malaria [15] and forecasting the final state of the spread of COVID-19 [16].

The use of data as an input to information systems in social contexts is widespread. Social agencies are increasingly implementing artificial intelligence methods in a wide range of applications. For example, to predict issues related to crime and public safety, such as the misconduct of subjects in prison for assigning appropriate security levels to inmates [17], killings committed by persons on probation to identify persons who pose a severe threat to public safety [18]. Similarly, threats to public safety, such as domestic violence, have been predicted [19], unemployment and marital status [20], future violence to help judges make sentencing decisions after conviction [21], and in controlling violence in prisons using assembly methods [22]. In the Colombian case, machine learning methods have been used to make COVID-19 forecasts [23], predict future dengue cases [24], predict insolvency in advance [25], and predict the outcome of missing persons [26].

2.2 Related works

During the disarmament process of ex-combatants, one of the most relevant issues is demobilization and social reintegration; however, there is not enough information that is useful to support the decisions made in these peacebuilding environments.

However, despite the scarce information, there are different approaches to the issue of demobilization of ex-combatants from armed conflicts from quantitative or qualitative approaches. Within the quantitative approach, specifically for the Colombian case, the study by Samii, et al. [27] estimates causal effects retrospectively from microdata with the help of a machine learning ensemble, illustrating an analysis of options to reduce recidivism of ex-combatants in Colombia. Consequently, in an approach to violence analysis, the work of Bazzi, et al. [28] research about How feasible is violence early-warning prediction? Attempting to predict violence one year ahead with a range of machine learning techniques using data from Colombia and Indonesia; Concluding that the developed models poorly predict new outbreaks or escalations of violence even with unusually rich data.

Meanwhile, in a direct application of machine learning to DDR processes, the paper of Kobach, et al. [29] uses Random Forest to determine predictors for appetitive aggressive and trauma-related mental illness. By investigating the frequency of psychopathological symptoms for high- and low-intensity conflict demobilization settings. Similarly, in the work of Quintero-Zea et al. [30] designed a study to characterize the emotional processing of Colombian ex-combatants of illegal groups, using supervised learning techniques and developing a tool to support decision-making during interventions with ex-combatants in the process of reintegration into society.

From another approach, the work carried out by Garcia-Barrera et al. [31] particularly analyzed the empathy factor by means of confirmatory factor analysis. Within their results, it was found that the manifest variable “empathic concern” has a strong correlation with the latent factor “empathy”. This generates a guide on the variables of greater importance in the reintegration processes of ex-combatants.

Finally, the work of Therese [32] proposes a new measurement strategy for territorial control in asymmetric civil wars. Territorial control is conceptualized as an unobserved latent variable that can be estimated via observed variation in rebel tactics, modelling the latent variable, territorial control via a Hidden Markov Model (HMM).

On the other hand, from a qualitative point of view, the work of Rosenau, et al. [33] analyzes 15,000 surveys of men and women who have left the Revolutionary Armed Forces of Colombia (FARC) and other violent extremist organizations. Thus, identifying the main factors that contributed to the decisions to leave the armed struggle: ideological disenchantment, the relentless nature of the government’s counterinsurgency campaign, the physical abuse by commanders, and the wish to reconnect with the families and rebuild their personal lives.

The work presented by Kaplan and Nussio [34] analyze the justifications for recidivism related to the experiences of combatants, criminal motives and surveys of groups of ex-combatants in Colombia. This study concludes that antisocial personality factors, absence of family ties, lack of educational achievement and presence in criminal groups are strongly related to recidivism among ex-combatants.

However, the work of de Vries and Wiegink [35] analyze the context where the disarmament, demobilization and reintegration (DDR) process do not always account for the difficulties for understanding community dynamics. Therefore, arguing that communities are not singular entities but subject to change and consist of people with different ideas and viewpoints about returning fighters.

The research of Baez, Santamaría-García and Ibáñez [36] develops integrative approaches for ex-combatants, including social, cognitive and affective mental processes. By considering the situated nature of post-conflict scenarios and the urgent need for evidence-based interventions and suggesting a two-stage approach for addressing ex-combatants’ reintegration programs. Likewise, the work presented by Schmitt et al. [37] analyzes the adversities faced by a reintegration process. The authors study on a sample of the population of the Democratic Republic of Congo, the factors of exposure to trauma, perpetration, mental health problems and stigmatization associated with the process of reintegration into society. Their findings highlight the need to intervene jointly on aspects of individual mental health, aggression and collective discriminatory positions.

For the Colombian case, the study of Casas-Casas and Gúzman-Gómez [38], propose that the processes DDR involve learning new ways of solving shared problems (mental models) through nonviolent mechanisms and recommending that disciplines such as political science take care of the perspectives of the process to produce an iterative dynamics that will switch on alarms and reorient the process when needed, with the aim of building true and lasting peace.

3. Data and methods

For this research, we use the public database of people who have entered the reintegration process in the context of the peace process in Colombia [39]. The dataset is composed of 52,139 observations and 21 predictor variables.

Considering the variables, eleven are categorical, eight binary and two numerical as shown in Table 1. Consequently, a 10-fold cross-validation approach was used to eliminate bias. The Training and testing data sets were chosen at randomly; so, the training dataset contains the 70% of the data and test dataset the remaining 30%. The Colombian reintegration agency collected and published this dataset and is fully available for research, but it had missing values. These missing values seem to be at random, so that the model will use only complete cases. Table 1 shows the variable name, the attributes chosen for the analysis and the proportion of missing values by each variable.

3.1 Data Pre-processing

A cross-validation scheme such as the one presented in Fig. 2 is used. Consequently, ten new folds are randomly created with the training dataset that will serve as training and cross-validation instruments. The K-Fold cross-validation scheme gives insight into how well it will fit an unknown dataset, avoiding common problems such as overfitting and selection bias. Therefore, each of the k datasets is used as the test set, while the remaining K-1 datasets are modelled in their respective training datasets. In cross-validation processes, the larger the value of K, a better generalization of the results would be expected, but in turn, this generates a greater computational time. Schaffer [40] gives a detailed description of the cross-validation process and its different modalities.

As can be seen in Fig. 1, the demobilization route is composed of several stages, including in-depth interviews with military and psychological professionals. It is vital to notice that the domain expert plays a crucial role in correctly developing the decision-making process in this problem. Thus, the proposed machine learning model is created to support those activities, not to replace the domain expert of the professionals.

Table 1
Summary of model’s variables

Variable	Description	Categories/summary	Type	% missing
Demobilization_type	Form of demobilization of the combatant	Individual, Collective	Categorical	0%
		AUC: United Self-Defense Forces of Colombia
		ELN: National Liberation Army
		EPL: Popular Liberation Army
Armed_group	Group of which the combatant was a member before his demobilization	ERG: Guevarist Revolutionary Army	Categorical	0%
		ERP: People’s Revolutionary Army
		FARC: Revolutionary Armed Forces of Colombia
Age_range	Age range to which the combatant belongs	(18 y 25), (26 y 40), (41 y 60), (Over 60 years old)	Categorical	0%
Gender	Gender	Female; Male	Categorical	2%
Final_situtation	Final status of the individual, considering legal and social aspects.	Active, Inactive, suspended, completed	Categorical	1%
Department	Department of Residence	The 33 Colombian geographical zones	Categorical	0%
Benefit_TRV	Whether the individual received benefit in transversal activities.	Yes $=$ 1, No $=$ 0	Binary	0%
Benefit_FA	Whether the individual received benefit in academic training.	Yes $=$ 1, No $=$ 1	Binary	0%
Benefit_FPT	Whether the individual received benefit in Job Training.	Yes $=$ 1, No $=$ 2	Binary	0%
Benefit_PDT	Whether the individual received benefit in the business plan.	Yes $=$ 1, No $=$ 3	Binary	2%
Education_Level	Maximum educational level	Literacy, Baccalaureate, Basic Primary, Basic Secondary “Unemployed”, “Not Applicable”,	Categorical	2%
Economic_occupancy	The economic occupation of the ex-combatants.	‘Employed in the Formal Sector”, “Employed in the Informal Sector”,“Economically Inactive Population”	Categorical	2%
Bonus_BIE	Whether the individual has disbursement of Economic Insertion Benefit	Yes $=$ 1, No $=$ 0	Binary	2%
Social_service	Linking to social service Actions.	Not linked, with certification, It is linked	Categorical	0%
Spouse	The individual has a Spouse or Permanent Partner	Yes $=$ 1, No $=$ 0	Binary	0%
Number_children	Number of children and/or stepchildren registered	mean $=$ 1,01, max $=$ 11, median $=$ 1	Numeric	0%
Total_family_group	Total number of members of the family group	mean $=$ 2,16, max $=$ 28 median $=$ 1	Numeric	0%
Habitability_census	Whether the individual has registered application of the housing Survey.	Yes $=$ 1, No $=$ 0	Binary	0.5%
Livingplace_type	Type of Housing	Apartment, house, house-lot, room, farm, room, ranch, others	Categorical	0%
public_services	public water, sewerage and electricity services.	Yes $=$ 1, No $=$ 0	Binary	0%
Health_system	The health regime of the individual	Contributory, Subsidized.	Categorical	0%

4. Materials and methods

This section summarizes the significant methodologies and approaches used in the literature to forecast social phenomena. No machine learning study has been conducted to predict armed group demobilization processes to the authors’ knowledge. However, the demobilization problem is structured similarly to other social challenges.

The main idea of supervised data learning models is to classify or forecast a future event based on known predictor variables. Thus, an advantage of implementing a data-driven solution is related to the generation of objective information for complex decision-making. In the case of this research, the decision of continuity, separation, and support to the reintegration is an essential aspect for the consolidation of the peace process in Colombia, given that the reintegration into civilian life uses the resources provided by the government to generate an autonomous and sustainable life.

Table 2 summarizes the most frequently utilized algorithms in the literature., showing selected references of applications in different social sectors. Consequently, each algorithm is briefly explained. The operational detail of ML algorithms will not be explained in detail. Instead, a summary description and reference will be presented where the algorithm is explained in depth.

Table 2
Most used classifiers on social sciences

Type	Algorithm	Selected references
Generalized linear	GLMENT	[41, 42]
Tree-based	Decision Tree	[43, 44]
Tree-based ensemble	Random Forest	[26, 45]
Non-parametric, distance based	KNN	[46, 47]
Ensemble – Optimized	XGBoost	[48, 49]

Figure 2.

K-fold cross validation process.

In this section, a variety of algorithms used for the modeling of social problems have been evidenced. However, due to the internal adjustment structures of the different models, it becomes challenging to identify the algorithm that best suits a situation. Thus, we’ll use a cross-validation process to generate a fair evaluation of the algorithm’s performance.

4.1 Random Forest (RF)

Random Forest models are methods articulated between machine learning algorithms; it entails the repeated and growing building of many decision trees using an aggregation method called bootstrapping [50]. Thus, generating several decision trees with varied variable compositions such that each tree provides an independent outcome, followed by a democratic approach in which the category with the most votes is chosen as the final output. Thus, the ability to generate different responses for each decision tree and then combine them into general forecast results in robust models that are less susceptible to extreme values than a basic decision tree, boosting the model’s prediction and classification capabilities.

The RF model incorporates a variable selection strategy, enabling it to handle many variables’ data sets if preceding processes are used to minimize dimensions. Additionally, the model allows for determining the importance of each variable for correctly classifying observations using a permutation test.

4.2 K-nearest neighbor

The KNN is one of the most popular neighbourhood-based classifiers in machine learning [51], given its simplicity and high efficiency to detect and classify elements into categories. The parameter k in KNN refers to the number of neighbours based on which a category is defined; this parameter is usually determined empirically. Depending on the problem, it is tested with different values of K, choosing the parameter with the best performance in precision. The algorithm’s operation relies on calculating a distance matrix between all the points of the training dataset.

4.3 XGBoost

The XGBoost is a robust and solid structure to enhance the results of machine learning models, especially tree-based models, parameterizing the iteration that depends on the calculation of the tree of choice, and that is used for positioning, characterization of observations according to the problem either classification or regression.

4.4 GLMNET

GLMNET’s machine learning technique is an assembly algorithm that builds a linear model by articulating lasso regression [52] and [53]. The assembly method is performed by penalizing the magnitude and number of final coefficients of the regression model, thus avoiding the problem of overfitting. This model form was evaluated in datasets with multiple predictor variables and few individual observations as accurate. Therefore, in the limit case, when the linking parameter takes a value of zero, the model operates strictly as a Ridge regression, and when it takes a value of 1, it works as a pure Lasso regression. In another scenario, for example, a Lambda value $=$ 0. 05 would result in a model with a Ridge regression of 95% and a Lasso regression of 5%.

4.5 Performance metrics

The classification process’s success is set by the difference between the anticipated and actual values. The True Positive (VP), True Negative (VN), False Positive (FP), and False Negative (FN) metrics all describe this relationship.

4.5.1 Accuracy

It is the number of correct predictions made divided by the total number of predictions made.

$\displaystyle\textit{Accuracy}=\frac{\textit{TP}+\textit{TN}}{n}$

4.5.2 Precision

It is the number of positive predictions divided by the total number of positive class values.

$\displaystyle\textit{Precision}=\frac{\textit{TP}}{\textit{TP}+\textit{FP}}$

4.5.3 Recall

It is the number of positive predictions divided by the number of positive class values in the test data.

$\displaystyle\textit{Recall}=\frac{\textit{TP}}{\textit{TP}+\textit{FN}}$

4.5.4 AUC

The AUC represents the area under the ROC curve, between 0.5 and 1. For a perfect classifier, the value of AUC should be 1. AUC as a numerical value to visually evaluate the quality of the classifier. The larger the AUC value, the better the classification effect. If the AUC value is larger, the classification effect will be better.

4.5.5 Kappa

Kappa indicates how much better your classifier performs than a classifier that merely predicts randomly based on the frequency of each class.

$\displaystyle\textit{Kappa}=\frac{P_{o}-P_{e}}{1-P_{e}}$

4.5.6 F1

It is the number of positive predictions divided by the number of positive class values in the test data.

$\displaystyle\textit{F1}=\frac{\textit{2*Precision*Recall}}{\textit{Precision}% +\textit{Recall}}$

5. Results

Table 3 summarizes the cross-validation hyperparameters for each classification algorithm. The AUC Cross-validation results for each of the implemented algorithms are shown in Fig. 3. The XGBT algorithm presents the best performance among the algorithms evaluated with a mean value of AUC $=$ 0.964. The three algorithms with the best performance are ensemble algorithms. Thus, it evidences the superior capacity of ensemble algorithms for predicting the future status of a demobilized in the framework of a peace process.

The Random Forest and the XGB are algorithms that recurrently assemble trees to generate a prediction that shows results higher than AUC $=$ 0.95. The KNN algorithm has a mean AUC $=$ 0.931; considering the non-parametric structure of the KNN to generate predictions and its susceptibility to the change of the data, low variability in the cross-validation results is evidenced.

Consequently, the data used to train the models is about individuals who entered the reintegration process, so these models suffer from a selection bias due to the absence of data on individuals who requested admittance to the reintegration process at some time but did not receive acceptance for a variety of reasons. This bias is essential to mention given the recommendation-based approach given to the model of this research, focused on generating objective support for decision-making on admission to the reintegration process into civilian life.

Table 3
Main hyperparameters used in the classifiers

Algorithm	Hyperparameters
RF	mtry $=$ 41; splitrule $=$ extratrees; min.node.size $=$ 1.
XGBOOST	Fitting nrounds $=$ 100, max depth $=$ 3, eta $=$ 0.3, gamma $=$ 0, colsample bytree $=$ 0.8, min child weight $=$ 1
GLMNET	Alpha $=$ 1; Lambda $=$ 0
KNN	$K=$ 7

The cross-validation procedure evaluated the performance of each of the algorithms, yielding forty different models (four algorithms and ten folds). Table 4 presents the results of each algorithm for the AUC, F1, and Kappa metrics. It is essential to mention that, although the XGB is the one that presents the best performance according to all metrics, the GLMNET yields competitive results for the Kappa and F1 metrics. In the case of GLMNET the value of the Alpha parameter was equal to 1, representing a pure Ridge regression, which allows estimating the impact of the variables on the result, which is not possible for XGB because of its Black Box algorithm characteristic.

Table 4

Performance metrics summary for the Cross-validation

	AUC			F1			Kappa
Models	Min	Mean	Max	Min	Mean	Max	Min	Mean	Max
XGB	0.961	0.964	0.967	0.743	0.755	0.767	0.826	0.832	0.835
GLMNET	0.960	0.962	0.966	0.744	0.750	0.756	0.827	0.81	0.832
RF	0.948	0.952	0.957	0.740	0.751	0.763	0.811	0.814	0.818
KNN	0.925	0.931	0.935	0.691	0.687	0.696	0.747	0.764	0.773

Figure 3.

Cross-validation results.

5.1 Test set evaluation

Finally, the model’s performance during the test is on confusion matrices in Fig. 4, where the number indicates the classified individuals per category. When evaluated on the test data, the XGB reached a mean accuracy of 89.6% (89.19%, 90.15%) in a confidence interval of 95%. The XGB model has a no-information rate (NIR) of 49.9% with the unseen data, even though the p-value for (Acc $>$ NIR) is 2.2e-16, evidencing the capacity of the model to predict the ex-combatant status more than the trivial prediction correctly. Figure 4 evidence that the most frequent misclassification in the four models is predicting and ex-combatant as out of the process when actually is absent.

Figure 4.

Confusion Matrices for the models evaluating the test set.

6. Discussion

The research uses forecasting methods derived from machine learning to predict a future event based on known predictive variables; so, the main contribution is to propose to decision-makers more accurate behavioural predictions that allow corrections to be made and help effectively conclude the reintegration process. The results do not seek to be a bias factor for acceptance or expulsion of an ex-combatant; on the contrary, to alert decision-makers about possible abandonment of the process. Therefore, fostering to take the necessary measures and successfully promote the continuity of ex-combatants to civilian life.

We find that the three algorithms with the best performance are the ensemble algorithms. Thus, it shows the superior capacity of joint algorithms to predict the future state of a demobilized in the framework of a peace process. Consequently, reliability is crucial for supervised algorithms because it demonstrates how results obtained from the training can be extrapolated for real-life applications. In the present research, the four algorithms implemented shows high reliability, evidencing the model’s capacity for effectively predicting ex-combatant future status. The classifiers represent a deviation of 0.0012, 0.0012, 0.0004, and 0.01 deviation for the XGB, GLMNET, RF and KNN.

As mentioned in the background section, the classification problem of ex-combatant in a demobilization process is similar to other social issues. Thus, the machine learning approach used in this research could be replicated to other social phenomena. For example, in the study of [22] developed a machine learning model to predict prison violence, obtaining that the best AUC value was 0.789 for the Random Forest algorithm. Thus, in the present research, the best model scored an AUC value of 0.952, suggesting that the ensemble models applied to predicting prison violence could outperform the current results.

Consequently, social reintegration refers to the ability of ex-combatants to become part of the social life again, participating in the collective decisions of the communities where they are established without returning to the violent and illegal actions of the past [54, 55, 56, 6]. Thus, the machine learning approach becomes a tool that supports the reintegration process with the contribution of objective information to maximize the results of the process in social terms.

Recently machine learning models have been applied in areas of the social sciences in order to help make decisions that positively impact people’s lives [11]; within these themes is the armed conflict and people outside the law, to whom society is willing to offer to abandon their criminal life to have a civilian life, in exchange for certain benefits and aids. Studies similar to this have been carried out by [17, 18, 19], whose objective is to help predict problems related to crime and threats to public safety as a solution to this social problem. The results obtained by these authors are coincident with those obtained in this research since the forecasting methods derived from machine learning, when applied in real situations, are similar, so these can be replicated.

Thus, the main objective of this research does not depend on accurately anticipating when an ex-combatant will emerge from the demobilization process but on providing the decision-makers involved with a guiding criterion on the acceptance and subsequent allocation of benefits received. So, ex-combatant who fails to reach the desired reintegration into civilian life will be prone to re-offending from the crime.

In this way, if reducing the percentage of abandonment in the process and identifying the critical aspects of succeeding in demobilization, to this same extent, the process will adopt an advanced decision-making scheme based on objective evidence, generating confidence in society about the Colombian peace process.

Finally, the results demonstrate that a predictive model can provide the tools and techniques needed to manage and process social data on ex-combatants. According to the literature review, supervised data learning models have not been used to estimate decisions about the acceptance and allocation of benefits received by an individual ex-combatant to maximize success in reintegration into civilian life. In this way, this study can be helpful in social issues related to armed conflict and national security by managing the complexity of a process of which there is not much background and providing a support scheme for strategic and operational decision-making.

One of the limitations of this study is associated with the bias of the database, considering that some combatants were not admitted to the demobilization process. Therefore, there is no information on these people who are somehow actors in the conflict and whose social profiles would feed the algorithms in order to improve the forecast of the future status of the ex-combatants.

The research approach has been of predicting the future status of an ex-combatant to help in the decision-making of the DDR process. If the DDR process is not successful, the entire society, the peace process, and the ex-combatants themselves may be jeopardized. As a result, failing to demobilize and reintegrate an ex-combatant successfully can be costly. It can result in numerous disadvantages in civilian life, increasing the possibility of the ex-combatant taking up arms again. Therefore, the model’s accuracy is paramount, and machine learning offers superior forecasting accuracy, at least compared to the domain expert’s traditional forecasting methods and judgments.

The machine learning models developed in this research are fully compatible with current empirical systems of the decision on the admission of a person to the DDR process. However, predictions of completion or abandonment of the process can avoid biases in accessing the DDR. Providing a sophisticated approach to the management of resources in the framework of a peace process would improve the perception by the actors of the process by avoiding factors of subjectivity.

7. Conclusion

This research demonstrated how a machine learning approach is effective in predicting the future status of a former combatant in the demobilization process in Colombia. The results obtained showed that the XGBoost method is the most suitable for predicting the future status since it combines high performance with a reasonable computational cost. It is worth mentioning the GLMNET algorithms since their results are competitive in all performance metrics.

At best, an AUC of 0.964 was obtained in the training phase and an AUC of 0.952 by simulating real cases in the test dataset. The deviation between the average values of the train’s AUC and the test was only 0.012, indicating that the model has high reliability. Therefore, this study provides a significant contribution to state of the art in machine learning applications in the social sciences, as it established a reproducible and replicable methodology to complex social contexts.

The social systems cannot be far from the advantages of applied machine learning models. Thus, based on a strategic decision, those models could save money, and when applied to a DDR process, machine learning can provide a more precise image.

References

Theidon

, Transitional Subjects: The Disarmament, Demobilization and Reintegration of Former Combatants in Colombia 1, International Journal of Transitional Justice 1(1) (2007), 66–90, [Online; accessed 2021-10-07]. doi: 10.1093/ijtj/ijm011.

Camacho

, 75.000 personas dejaron las armas en el país en los últimos 20 años, 2020, section: politica. https://www.eltiempo.com/politica/proceso-de-paz/cifras-de-desmovilizacion-en-el-pais-en-los-ultimos-20-anos-548540.

Ribetti

, Disengagement and beyond: a case study of demobilization in Colombia, in: Leaving Terrorism Behind, Routledge, 2008, p. 18. ISBN 978-0-203-88475-1.

de Posada

C.V.

, Motives for the Enlistment and Demobilization of Illegal Armed Combatants in Colombia, Peace and Conflict: Journal of Peace Psychology 15(3) (2009), 263–280. doi: 10.1080/10781910903032609.

Nussio

and Howe

, When Protection Collapses: Post-Demobilization Trajectories of Violence, Terrorism and Political Violence 28(5) (2016), 848–867. doi: 10.1080/09546553.2014.955916.

Mouly

Delgado

E.H.

and Giménez

, Reintegración social de excombatientes en dos comunidades de paz en Colombia, Análisis Político 32(95) (2019), 3–22, Number: 95. doi: 10.15446/anpol.v32n95.80822.

Montoya

P.C.S.

and Herrera

L.A.M.

, Las prácticas sociales y la reincidencia de personas en proceso de reintegración, en el marco de la política nacional de reintegración económica y social, Revista de Antropología y Sociología: Virajes 20(1) (2018), 129–147, Number: 1. doi: 10.17151/rasv.2018.20.1.7.

Sacristán

A.F.

, Economic reintegration of illegal group ex-combatants in the Meta department, Revista de Economía Institucional 22(43) (2020), 223–247. doi: 10.18601/01245996.v22n43.10.

Grimmer

, We are all social scientists now: How big data, machine learning, and causal inference work together, PS: Political Science & Politics 48(1) (2015), 80–83.

10.

Ghabri

Hirmer

and Mitschang

, A Hybrid Approach to Implement Data Driven Optimization into Production Environments, in Lecture Notes in Business Information Processing Abramowicz

and Paschke

, eds, Springer International Publishing, 2018, pp. 3–14. ISBN 978-3-319-93931-5.

11.

Hindman

, Building Better Models: Prediction, Replication, and Machine Learning in the Social Sciences, The ANNALS of the American Academy of Political and Social Science 659(1) (2015), 48–62, [Online; accessed 2019-12-10]. doi: 10.1177/0002716215570279.

12.

Westerveld

J.J.L.

van den Homberg

M.J.C.

Nobre

G.G.

van den Berg

D.L.J.

Teklesadik

A.D.

and Stuit

S.M.

, Forecasting transitions in the state of food security with machine learning using transferable features, Science of The Total Environment (2021), 147366, [Online; accessed 2021-10-07]. doi: 10.1016/j.scitotenv.2021.147366.

13.

Lee

S.I.

Adans-Dester

C.P.

OBrien

A.T.

Vergara-Diaz

G.P.

Black-Schaffer

Zafonte

J.G.

and Bonato

, Predicting and Monitoring Upper-Limb Rehabilitation Outcomes Using Clinical and Wearable Sensor Data in Brain Injury Survivors, IEEE Transactions on Bio-Medical Engineering 68(6) (2021), 1871–1881, PMID: 32997621. doi: 10.1109/TBME.2020.3027853.

14.

Berens

Schneider

Gortz

Oster

and Burghoff

, Early Detection of Students at Risk – Predicting Student Dropouts Using Administrative Student Data from German Universities and Machine Learning Methods, Journal of Educational Data Mining 11(3) (2019), 1–41, publisher: International Educational Data Mining. https://eric.ed.gov/?id=EJ1241620.

15.

de Lima

M.V.M.

and Laporta

G.Z.

, Evaluation of prediction models for the occurrence of malaria in the state of Amapá, Brazil, 1997–2016: an ecological study, Epidemiologia e Serviços de Saúde 30 (2021), publisher: Secretaria de Vigilancia em Saúde – Ministério da Saúde do Brasil. doi: 10.1590/S1679-49742021000100007.

16.

Zawbaa

H.M.

El-Gendy

Saeed

Osama

Ali

A.M.A.

Gomaa

Abdelrahman

Harb

H.S.

Madney

Y.M.

and Abdelrahim

M.E.A.

, A study of the possible factors affecting COVID-19 spread, severity and mortality and the effect of social distancing on these factors: Machine learning forecasting model, International Journal of Clinical Practice 75(6) (2021), e14116. doi: 10.1111/ijcp.14116.

17.

Berk

Sherman

Barnes

Kurtz

and Ahlman

, Forecasting murder within a population of probationers and parolees: a high stakes application of statistical learning, Journal of the Royal Statistical Society: Series A (Statistics in Society) 172(1) (2009), 191–211. doi: 10.1111/j.1467-985X.2008.00556.x.

18.

Berk

and Hyatt

, Machine learning forecasts of risk to inform sentencing decisions, Federal Sentencing Reporter 27(4) (2015), 222–228.

19.

Berk

R.A.

Sorenson

S.B.

and Barnes

, Forecasting Domestic Violence: A Machine Learning Approach to Help Inform Arraignment Decisions, Journal of Empirical Legal Studies 13(1) (2016), 94–115. doi: 10.1111/jels.12098.

20.

Berk

, Criminal Justice Forecasts of Risk: A Machine Learning Approach, Australian & New Zealand Journal of Statistics 55(2) (2012), 199–201. doi: 10.1111/anzs.12019.

21.

Cunningham

M.D.

and Reidy

T.J.

, Violence Risk Assessment at Federal Capital Sentencing: Individualization, Generalization, Relevance, and Scientific Standards, Criminal Justice and Behavior 29(5) (2002), 512–537, publisher: SAGE Publications Inc. doi: 10.1177/009385402236731.

22.

Baćak

and Kennedy

E.H.

, Principled Machine Learning Using the Super Learner: An Application to Predicting Prison Violence, Sociological Methods & Research 48(3) (2019), 698–721, [Online; accessed 2019-12-10] doi: 10.1177/0049124117747301.

23.

Arango-Londoño

Ortega-Lenis

Muñoz

Cuartas

D.E.

Caicedo

Mena

Torres

and Méndez

, Predicciones de un modelo SEIR para casos de COVID-19 en Cali, Colombia, Revista de Salud Pública 22(2) (2020), 1–6, Number: 2. doi: 10.15446/rsap.v22n2.86432.

24.

Zhao

Charland

Carabali

Nsoesie

E.O.

Maheu-Giroux

Rees

Yuan

Balaguera

C.G.

Ramirez

G.J.

and Zinszer

, Machine learning and dengue forecasting: Comparing random forests and artificial neural networks for predicting dengue burden at national and sub-national scales in Colombia, PLOS Neglected Tropical Diseases 14(9) (2020), e0008056, publisher: Public Library of Science. doi: 10.1371/journal.pntd.0008056.

25.

Correa-Mejía

D.A.

Lopera-Castaño

Correa-Mejía

D.A.

and Lopera-CastaÃ±o

, Financial ratios as a powerful instrument to predict insolvency; a study using boosting algorithms in Colombian firms, Estudios Gerenciales 36(155) (2020), 229–238, publisher: Universidad Icesi. doi: 10.18046/j.estger.2020.155.3588.

26.

Delahoz-Domínguez

and Mendoza-Brand

, A predictive model for the missing people problem, Romanian Journal of Legal Medicine 29(1) (2021), 74–80, [Online; accessed 2021-10-07]. doi: 10.4323/rjlm.2021.74.

27.

Samii

Paler

and Daly

S.Z.

, Retrospective Causal Inference with Machine Learning Ensembles: An Application to Anti-recidivism Policies in Colombia, Political Analysis 24(4) (2017), 434–456, Publisher: Cambridge University Press. doi: 10.1093/pan/mpw019. https://www-cambridge-org-443.web.bisu.edu.cn/core/journals/political-analysis/article/abs/retrospective-causal-inference-with-machine-learning-ensembles-an-application-to-antirecidivism-policies-in-colombia/B27477770599A4CE0ACB9204685EA95B.

28.

Bazzi

Blair

R.A.

Blattman

Dube

Gudgeon

and Peck

, The Promise and Pitfalls of Conflict Prediction: Evidence from Colombia and Indonesia, The Review of Economics and Statistics (2021), 1–45. doi: 10.1162/rest_a_01016.

29.

Köbach

Schaal

and Elbert

, Combat high or traumatic stress: violent offending is associated with appetitive aggression but not with symptoms of traumatic stress, Frontiers in Psychology 5 (2015). https://www.frontiersin.org/article/10.3389/fpsyg.2014.01518.

30.

Schmitt

Robjant

and Koebach

, Characterization Framework for Ex-combatants Based on EEG and Behavioral Features, VII Latin American Congress on Biomedical Engineering CLAIB 2016 60 (2016), [Online; accessed 2022-03-01]. doi: 10.1007/978-981-10-4086-3_52.

31.

K.K.T.-O.N.T.-O.S. Garcia-Barrera

and Pineda

, Evaluating empathy in Colombian ex-combatants: Examination of the internal structure of the Interpersonal Reactivity Index (IRI) in Spanish, Psychological Assessment 29(1) (2017), 116–112, [Online; accessed 2022-03-01]. doi: 10.1037/pas0000331.

32.

Anders

, Territorial control in civil wars: Theory and measurement using machine learning, Journal of Peace Research 57(6) (2020), 701–714, Publisher: SAGE Publications Ltd. doi: 10.1177/0022343320959687.

33.

Rosenau

Espach

Ortiz

R.D.

and Herrera

, Why They Join, Why They Fight, and Why They Leave: Learning From Colombia’s Database of Demobilized Militants, Terrorism and Political Violence 26(2) (2014), 277–285, Publisher: Routledge _eprint: doi: 10.1080/09546553.2012.700658.

34.

Kaplan

and Nussio

, Explaining Recidivism of Ex-combatants in Colombia, Journal of Conflict Resolution 61(1) (2016), 64–93, [Online; accessed 2022-03-01]. doi: 10.1177/0022002716644326.

35.

de Vries

and Wiegink

, Breaking up and Going Home? Contesting Two Assumptions in the Demobilization and Reintegration of Former Combatants, International Peacekeeping 18(1) (2011), 38–51, Publisher: Routledge _eprint: doi: 10.1080/13533312.2011.527506.

36.

Baez Sandra

S.-G.H.

and Ibáñez

, Disarming Ex-Combatants’ Minds: Toward Situated Reintegration Process in Post-conflict Colombia, Frontiers in Psychology 10 (2019). doi: 10.3389/fpsyg.2019.00073.

37.

Schmitt

Robjant

and Koebach

, When reintegration fails: Stigmatization drives the ongoing violence of ex-combatants in Eastern Democratic Republic of the Congo, Brain and Behavior 11(6) (2021), [Online; accessed 2022-03-01]. doi: 10.1002/brb3.2156.

38.

Casas-Casas

and Guzmán-Gómez

, The Eternal Yesterday? The Colombian Reintegration Process, Papel Politico 15(1) (2010), 47–85, Publisher: Pontificia Universidad Javeriana. http://www.scielo.org.co/scielo.php?script=sci_abstract&pid=S0122-44092010000100003&lng=en&nrm=iso&tlng=en.

39.

A. para reincorporación y la normalización, Estadísticas de las personas desmovilizadas que han ingresado al proceso de reintegración, 2020, [Online; accessed 2021-07-07]. https://www.datos.gov.co/Inclusi-n-Social-y-Reconciliaci-n/ESTAD-STICAS-DE-LAS-PERSONAS-DESMOVILIZADAS-QUE-HA/39pj-dba6.

40.

Schaffer

, Selecting a classification method by cross-validation, Machine Learning 13(1) (1993), 135–143. doi: 10.1007/BF00993106.

41.

Fontalvo

T.J.

De La Hoz

E.J.

and Olivos

, Methodology of data envelopment analysis (DEA) – GLMNEt for assessment and forecasting of financial efficiency in a free trade zone – Colombia, Informacion Tecnologica 30(5) (2019), 263–270. doi: 10.4067/S0718-07642019000500263.

42.

Petersen

Johnson

Hall

and O’Bryant

, Comparison of support vector machine, random forest, extreme gradient boosting and lasso and elastic-net regularized generalized linear model for Alzheimer’s Disease prediction (2021), Accepted: 2021-04-30T13:46:47Z. https://unthsc-ir.tdl.org/handle/20.500.12503/30471.

43.

Kolo

K.D.

A.S.

and Alhassan

J.K.

, A Decision Tree Approach for Predicting Students Academic Performance, International Journal of Education and Management Engineering (2015), publisher: Modern Education and Computer Science Press. doi: 10.5815/ijeme.2015.05.02.

44.

Shaikhina

Lowe

Daga

Briggs

Higgins

and Khovanova

, Decision tree and random forest models for outcome prediction in antibody incompatible kidney transplantation, Biomedical Signal Processing and Control 52 (2019), 456–462. doi: 10.1016/j.bspc.2017.01.012.

45.

Deepika

and Sathyanarayana

, Relief-F and Budget Tree Random Forest Based Feature Selection for Student Academic Performance Prediction, International Journal of Intelligent Engineering and Systems 12(1) (2019), 30–39.

46.

Adeniyi

D.A.

Wei

and Yongquan

, Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method, Applied Computing and Informatics 12(1) (2016), 90–108, [Online; accessed 2021-06-22]. doi: 10.1016/j.aci.2014.10.001.

47.

Serpen

and Aghaei

, Host-based misuse intrusion detection using PCA feature extraction and kNN classification algorithms, Intelligent Data Analysis 22(5) (2018), 1101–1114, publisher: IOS Press. doi: 10.3233/IDA-173493.

48.

Elavarasan

and Vincent

D.R.

, Reinforced XGBoost machine learning model for sustainable intelligent agrarian applications, Journal of Intelligent & Fuzzy Systems 39(5) (2020), 7605–7620, publisher: IOS Press. doi: 10.3233/JIFS-200862.

49.

Lin

Jiang

Fan

and Wang

, A stacking model for variation prediction of public bicycle traffic flow, Intelligent Data Analysis 22(4) (2018), 911–933, publisher: IOS Press. doi: 10.3233/IDA-173443.

50.

Breiman

, Random Forests, Machine Learning 45(1) (2001), 5–32, [Online; accessed 2020-04-27]. insights.ovid.com.

51.

Kataria

and Singh

M.D.

, A review of data classification using k-nearest neighbour algorithm, International Journal of Emerging Technology and Advanced Engineering 3(6) (2013), 354–360.

52.

Hans

, Bayesian lasso regression, Biometrika 96(4) (2009), 835–845.

53.

Hoerl

A.E.

and Kennard

R.W.

, Ridge regression: Biased estimation for nonorthogonal problems, Technometrics 12(1) (1970), 55–67.

54.

Herrera

and Gonzáles

, Estado del arte del DDR en Colombia frente a los estandares internacionales en DDR (IDDRS), Revista Colombia Internacional (2013), 273–304, Publisher: Universidad de los Andes (Colombia). https://go.gale.com/ps/i.do?p=IFME&sw=w&issn=01215612&v=2.1&it=r&id=GALE%7CA331688062&sid=googleScholar&linkaccess=abs.

55.

Kaplan

and Nussio

, Community counts: The social reintegration of ex-combatants in Colombia, Conflict Management and Peace Science 35(2) (2018), 132–153, Publisher: SAGE Publications Ltd. doi: 10.1177/0738894215614506.

56.

Bowd

and Özerdem

, How to Assess Social Reintegration of Ex-Combatants, Journal of Intervention and Statebuilding 7(4) (2013), 453–475. doi: 10.1080/17502977.2012.727537.

Machine learning application in the ex-combatant demobilization process on the Colombian armed conflict

Abstract

Keywords

1. Introduction

2. Background and context

2.2 Related works

3. Data and methods

3.1 Data Pre-processing

Table 1 Summary of model’s variables

Table 2 Most used classifiers on social sciences

4.2 K-nearest neighbor

4.3 XGBoost

4.4 GLMNET

4.5 Performance metrics

4.5.1 Accuracy

4.5.2 Precision

4.5.3 Recall

4.5.4 AUC

4.5.5 Kappa

4.5.6 F1

5. Results

Table 3 Main hyperparameters used in the classifiers

7. Conclusion

References

Table 1
Summary of model’s variables

Table 2
Most used classifiers on social sciences

Table 3
Main hyperparameters used in the classifiers