Abstract
This article introduces the Rebel Foreign Fighter Dataset (RFFD) which can be used to expand research on civil war and foreign fighters (FFs). First, it largely expands the previously reported data on the number of FFs that have and continue to be involved in conflicts across the globe. Second, this database disaggregates FFs into the various rebel groups they inhabit as opposed to simply categorizing them as residing within a rebel movement. Third, low, high and best estimates of FFs within rebel groups have been provided as well as providing a novel FF ordinal coding mechanism. These additions allow for more accurate conclusions to be drawn on the effects of FFs on specific groups as well as on the conflict in which they reside. Using the RFFD, the link between FF inclusion and civilian sexual violence discussed in Doctor’s study will be re-examined. The new findings show that FF numbers below 1,000 do not have a significant impact on moderate levels of civilian victimization perpetrated by a rebel group and FF numbers below 100 do not have a significant impact on high levels of civilian victimization.
Introduction
Research into foreign fighters (FFs) has increased drastically in the last decade following the tens of thousands of individuals that travelled to Syria and Iraq. Studies have explored who is choosing to travel, why they are radicalized and the effects that they have when they arrived and when they return home. Researchers in this area have spent much time and effect to gain insight into the effects that FFs have on their surroundings (Moore & Tumelty, 2008; Moore, 2019; Doctor & Willingham, 2020). Nevertheless, the data on FFs lack detail and depth. Many studies have emphasized the importance of understanding the results of FF inclusion (Bakker, Paulussen & Entenmann, 2013; Lindekilde, Bertelsen & Stohl, 2016; Chu & Braithwaite, 2017; Schwartz, 2023), but little has been done to fill the gap in knowledge that comes with describing the number of FFs in various conflicts. While some studies such as Hegghammer (2010) have provided data on the number of Islamic FFs in civil conflicts, the Foreign Fighter Project (FFP), created by David Malet, has been at the forefront of FF research. While being the primary resource for many studies in the field, an expansion of the data on the number of FFs as well as information on the groups in which they reside would provide a vital addition to the literature.
The subject of this article, the Rebel Foreign Fighter Dataset (RFFD), addresses many of the current limitations in the data concerning FFs. The RFFD identifies conflicts and the rebel group involved that includes FFs between 1985 and 2022 and provides over a 500% increase in dyads that include FFs. This not only provides an exponential increase in information that can be used to understand the presence and prominence of FFs, but it also allows for the effects of FFs on the rebel group and conflict to be more accurately analysed. The new information will provide scholars that seek to understand civil war, civilian victimization and rebel group violence the ability to arrive at more accurate conclusions on the effect of FFs. In the following sections, the RFFD will be described and compared with the FFP, highlighting the expansion of the data available, details provided and variables offered along with the previous limitations. Lastly, the RFFD will be used to re-examine Doctor’s (2021) study on the impact that FFs have on sexual violence. Doctor (2021) found that the mere presence of FFs has a significant effect on sexual victimization. Using the RFFD, the results vary in that they find that there is a threshold that FFs must pass in order for their effects to be significant. The rebel group must exceed 1,000 FFs before there is a significant effect on ‘some’ level of sexual violence and 100 FFs for there to be a significant effect on ‘high’ levels of sexual violence.
Defining group and FF
The RFFD compiles data concerning FFs in civil conflicts during the period of 1985–2022 because an emphasis was placed on civil conflicts that occurred post-Cold War. While some of the conflicts included did start prior to the end of the Cold War, they continued after the fall of the Soviet Union and FFs were reported to have joined these conflicts after 1991. For that reason, they are included in the dataset. This period includes 65 civil conflicts with 282 groups documented as having FFs. While a vast majority of the conflicts that are documented in the dataset are categorized as civil wars, there are instances of uprisings and civil unrest where FFs were active. For example, the Slovak National Uprising in 2000, the Equatorial Guinea Coup in 2018 and unrest in Kazakhstan in 2022 all recorded the presence of FFs. These cases are included within the dataset because these data are an attempt to document all instances of FFs, not simply FFs in official civil wars. This dataset does not include FFs involved in interstate war.
The term ‘Foreign Fighter’ is a largely debated term, and many scholars disagree on what constitutes an FF with Malet saying that many scholars have an ‘I know it when I see it’ mentality (Malet, 2010). Moore & Tumelty (2008) describe FFs as ‘non-indigenous, non-territorialized combatants who, motivated by religion, kinship, and/or ideology rather than pecuniary reward, enter a conflict zone to participate in hostilities pg. 412.’ Paper & Feldman (2010) describe FFs as individuals who are travelling to partake in a conflict to support ‘kindred communities’ in a country that is not their own. Hegghammer (2010) outlines a restrictive definition of FFs as individuals who take part in an insurgency, lack citizenship in the state in which he or she is fighting, lack affiliation to an official military organization and are not paid.
Because of Malet’s work on the FFP, the RFFD implements the definition that is put forth by Malet (2015) which describes FFs as ‘non-citizens of conflict states who join insurgencies during civil conflicts. pg. 456.’ This definition is echoed by Bryan (2010) pg. 116, ‘not agents of foreign governments, but they leave home typically to fight for a transnational cause or identity,’ as well as Bakke (2014) who describes FFs as ‘transnational insurgents in intrastate conflicts [to] refer to armed non-state actors who, for either ideational or material reasons, choose to fight in an intrastate conflict outside their own home country pg. 457.’ This includes individuals who are ethnically, religiously, or ideologically tied to the group for which they are travelling to fight for such as Islamist individuals who travelled to Iraq to join Al-Qaeda all the way to individuals travelling to join the Trotskyist Leon Desov Brigade in Syria to fight for their anarchist, anti-fascist ideology. From this definition, individuals who are motivated by material gains are similarly categorized as FFs. While this does include organizations such as the Russian Wagner Group, it also includes many ideological groups that motivate foreigners with a stipend. Utilizing Hegghammer (2010), individuals who are part of an organized, intervening, third-party country, are not considered FFs. For example, France deployed French troops to Côte d’Ivoire to help settle the violence that was occurring. These individuals would not be considered FFs under the definition utilized by the RFFD.
For an individual to be listed as taking part in the conflict, they had to have been active within the conflict zone as opposed to simply providing monetary support from abroad or actively supporting the cause via social media. In addition, the RFFD includes FFs that did not adopt a specific name but were travelling and interacting together within the conflict. This can include groups such as Darfur Rebels, Somali Fighters and Russian Cossacks, among several others. These unofficial groups were included because: (a) they took part in the conflict that was occurring and, more importantly; (b) they adopted FFs into their ranks. In addition, there is no minimum or maximum member number necessary to be included as a group within the dataset if FFs resided within the ranks. This is an important note when it comes to groups such as the Revolutionary Union for International Solidarity in Syria which included, at most, only a couple dozen members and around five FFs.
Existing data on FFs
Thus far, there are two main sources of FF data: Hegghammer (2010); and Malet’s (2016) FFP. Hegghammer (2010)
Best estimate total
The FFP focuses on FFs in 93 civil conflicts, their location and various details about them such as their type and their location relative to the conflicts between the years 1821 and 2014. While the dataset does include a variable for the number of FFs present in the conflict, only 30 of the conflicts included details on the number of FFs involved. Furthermore, the numbers can be vague such as ‘several’ or ‘100s’ of FFs. In the conflicts following the end of the Cold War, only 23% of the conflicts include the number of FFs. When compared to the FFP, the RFFD (1985–2022) increases the number of conflicts included in Malet’s dataset by 44% and increases the number of dyads analysed by 526.7%. In addition, instead of analysing the conflict from the perspective of ‘state vs. opposition,’ the RFFD analyses the ‘state vs. group’ relationship. Lastly, the RFFD provides both the low/high/best estimates for the number of FFs within a group as well as providing an updated ordinal coding mechanism.
The RFFD
The RFFD details the prevalence of FFs following the Cold War, a time that has shown a steady increase in FFs involved in conflicts around the world (Figure 1). Throughout the decades, technological advancements have led to an increase in the dissemination of information and terrorist propaganda (Denning, 2010; Hughbank & Ferrandino, 2012; Mahmood, 2013; Mueller & Johnson, 2018; Vacca, 2019), ease of travel to participate (Conway, 2006; Hegghammer, 2013) and material ability to conduct attacks (Silber, Bhatt & Senior Intelligence Analysts, 2007; Torok, 2010; Combs, 2017). Because of this, FFs have become a more vital element in many recent conflicts such as Syria, Afghanistan and Ukraine. These conflicts are apparent when seeing the notable rise in FFs starting in 2011 and moving forward.
Foreign Fighter Dataset comparison
Level of analysis comparison
Second, the RFFD provides a more nuanced view of what groups the FFs inhabit. Previously, the FFP approached the inclusion of FFs from the dyadic point of view of the state versus rebels. This can be sufficient in instances in which the state is fighting a singular rebel group, but it tends to be the case that a multitude of rebel groups are interacting within a civil war. By attributing FFs to the broad rebel movement, Malet’s assigned values hide useful information on all these rebel groups. In the ‘Syria vs. Rebels’ dyad, Malet attributes 35,000 FFs to the rebel movement but does include information about the distribution of foreigners. In some of the conflicts, the FFP does provide a list of rebel groups that the FFs have been recorded to have joined, but it does not provide detail about how many reside in each. In such conflicts, rebel groups take on varying numbers of FFs among their ranks. Under the previous description, scholars are unable to understand how the inclusion of FFs affects the individual rebel groups within the civil war. Following the disaggregation of the rebel movement into individual rebel groups, this dataset includes 282 ‘Government’ vs. ‘Group’ dyads. That represents a 526.7% increase in the dyads available. Instead of ‘Syria vs. Rebels’ having 35,000 FFs, ‘Syria vs. Jabhat al-Nusra’ has 1,500 FFs, ‘Syria vs. Jaysh al-Muhajireen wal-Ansar’ has 1,000 FFs and so on. Table II provides an example of the differing reporting from Hegghammer, Malet and the RFFD for the conflict in Iraq (2003– ). Hegghammer estimates the number of FFs in Iraq (2003– ) to have been between 200 and 400 and Malet estimates the number of FFs, labelled ‘Islamists,’ to be around 5,000. In contrast, RFFD provides 10 different groups that include FFs while also providing a more nuanced view of where the FFs reside and to what extent.
Rebel Foreign Fighter Database ordinal code

Number of dyads vs. ordinal code

Best estimate per region
For the period from 1989–2022, the RFFD describes 65 conflicts across 44 countries. There is not a single year since the start that does not include a civil conflict in at least one country. The most FF activity occurred in Africa (38.5% of cases, n = 25 observations), followed by Asia (26.2%, n = 17), Europe (16.9%, n = 11), the Middle East (15.4%, n = 10) and South/Central America (3.1%, n = 2). This distribution can be seen in Figure 3 which includes the total sum of the ‘best estimate’ of FFs in each region. Within the RFFD, 4.1% of the groups included 1–10, 17.95% included 11–100 FFs, 28.72% included 101–500, 14.36% included 501–1,000, 20.51% included 1,001–3,000, 5.12% included 3,001–5,000 and 4.62% included 5,001–10,000 and 10,000+.
Data collection
In identifying the presence of FFs, the RFFD relies on academic and governmental sources along with secondary sources from the NEXIS database. Key search words such as ‘foreign fighter,’ ‘foreign insurgent,’ and ‘foreign jihadists’ were used along with location and group name to narrow down sources. Each source was analysed for relevant information, and if beneficial to the database, recorded alongside the FF number documented. Once the sources were collected, low, high and best estimates of FFs were calculated for each rebel group. It is worth noting that there were rebel groups in which only one source was able to be discovered. In those situations, the FF number was used as both the ‘high estimate’ and the ‘best estimate.’
Drawing solely from secondary news sources was not sufficient to document the ‘best’ estimate of FF because of potential bias and misinformation. Newspapers can, at times, selectively edit and censor the information disseminated which can leave the information reported less than trustworthy. The RFFD sought to reduce bias by corroborating secondary sources with academic or governmental documents. These included many academic sources from scholars writing about specific conflicts such as ‘Child soldiers in the east of the Democratic Republic of the Congo’ (Rakisits, 2008) to academics exploring the inner workings of a rebel group such as ‘Arming the Revolutionary United Front’ (Berman, 2001). These articles along with United Nations documents on specific rebel groups and information from research institutes such as the Center for Strategic and International Studies, The Combating Terrorism Center at West Point, Human Rights Watch, The Washington Institute, as well as the Mackinzie Institute were used.
While an emphasis on primary sources allowed for some of the information to be corroborated, the reliability of news sources can continue to be an issue. For that reason, when calculating the ‘Best Estimate’ of FFs, documents from academics, governments, or research institutes were placed at the forefront. Whenever possible, these sources were used to boost the reliability of the number being reported. For example, when documenting the amount of FF in the El Mujahed Unit within the Bosnian Army, a news source reported that there were 800 FFs within the group. Upon further research, two separate academic sources refuted this claim in that there were around 3,000 FF within the group (Zosak, 2009; Mustapha, 2013). That said, there are several conflicts, such as Zaire and the Philippines, and rebel groups, Abu Sayyaf or the Moro Islamic Liberation Front, that offered up little to no information about its members let alone if foreigners reside among the ranks. In situations such as this, secondary sources were accepted to be used to create the ordinal code.
To increase the transparency of the data, a reliability variable was included. When calculating the best estimate of FF, an estimate will receive a reliability score of 4 if the number is reported by an academic, government, or institutional report and is corroborated by a second document. An estimate will receive a score of 3 if the number is reported by an academic, government, or institutional report but does not have a secondary document corroborating it. If a ‘best average’ is found by using the average of two academic, government, or institutional reports, it will receive a 3. An estimate will receive a reliability score of 2 if the number is reported by a secondary news source and is corroborated by a secondary news source. In addition, an estimate will receive a reliability score of 2 if the ‘best estimate’ is gathered by averaging the high and low estimate that is provided by secondary sources. An estimate will receive a reliability score of 1 if the number is reported by a secondary news source but is not corroborated by a second news source. Because of the low reliability of some of the estimates that were supported by only one secondary source and included an extremely low number of FFs, a threshold for the ordinal code was implemented. For the groups that fall within the ‘1’ range for both the ordinal and reliability coding, the estimate of FFs was documented, but no ordinal code was officially assigned to them.
Application of the dataset
To demonstrate the usefulness of the RFFD, this section will re-analyse a previous study with the new data provided. There are several studies utilizing Malet’s (2010) data with Doctor (2021) being one of the most recent that utilized the FFP Dataset. Doctor investigates whether FFs have an influence on the amount of sexual victimization that is perpetrated on the civilian population. In his work, Doctor uses a sample of 143 rebel groups that were active from 1989–2011 which is then followed by a case study of the Islamic State in Iraq and the Levant.
The literature surrounding the effects of FFs argues that not only do they typically lack preparedness for combat, language skills, or real conflict experience (Mendelsohn, 2011; Moore, 2019), but FF can also become problematic to the rebel group that they inhabit such as creating division among the group (Watts, 2016), increasing dissatisfaction from the native population (Bakke, 2014) and lead to increases in counterterrorism measures (Bacon & Muibu, 2019). For that reason, Doctor argues that the implementation of sexual violence ‘offers a means by which rebel commanders can limit the divisive challenges that FFs pose to the internal cohesion of their organization’ (Doctor, 2021). Doctor also explains that it may be that rebel leaders are more willing to accept the violence against civilians because of the material support that the FFs provide to the group. Doctor utilized the FFP in his quantitative analysis of FFs and, because of the limitations of the previous data, is forced to implement a dichotomous FF variable. As discussed previously, the inherent issue is that a dichotomous coding of FFs fails to capture the wide variance in the number that is present among those groups that include FFs. For example, Ansar al-Islam and Croatian Irregulars are both coded as a 1 in Doctor’s analysis, but Ansar al-Islam included only 150 FFs while the Croatian Irregulars were home to around 4,000.
To recalculate Doctor’s models, his method is duplicated while inserting a new ‘FFOrdinal’ independent variable. The findings differ greatly from the results that were concluded by Doctor’s analysis using Malet’s data. Doctor finds that the inclusion of FFs within the ranks of the rebel group has a significant impact on the rebel group conducting ‘some’ (95% confidence interval (CI)) and ‘high’ (99% CI) levels of sexual victimization. ‘Some’ sexual violence is defined as ranging from some sexual violence reported to widespread sexual violence reports. ‘High’ sexual violence is defined as ‘massive/systematic’ sexual violence. These categorizations were gathered from the Sexual Violence in Armed Conflict Dataset (Cohen & Nordås, 2014).
Utilizing the RFFD ordinal code, I find that the inclusion of 11–100 FFs in a rebel group (ordinal code 2) has no significant relation to the amount of sexual violence that a group perpetrates. I find that when there are 101–500 FFs (ordinal code 3) among the ranks of a rebel group, the rebel group has a 0.07 increased probability of committing ‘high’ levels of sexual violence. But FFs at this level provide no significant impact on the rebel group committing ‘some’ levels of sexual violence. Similarly, I find that when there are 501–1,000 FFs (ordinal code 4) among the ranks of a rebel group, the group has a 0.05 increased probability of coming ‘high’ levels of sexual violence. Again, FFs at this level provide no significant impact on the rebel group committing ‘some’ levels of sexual violence. Lastly, I find that when there are 1,001–3,000 FFs (ordinal code 5) among the ranks of a rebel group, the rebel group has a 0.27 increased probability of committing ‘some’ sexual violence and a 0.32 increased probability of committing ‘high’ levels of sexual violence. In order to ensure the same analysis that was conducted previously, the rebel groups that were analysed in Doctor (2021) were used which did not include any groups that were coded as a 7 and 8. In addition, because of a lack of availability of information that was available for some of the variables that Doctor included, the groups that were coded 6 were not able to be used in the statistical analysis. These results can be seen in Table IV. To test whether my results are sensitive to changing ordinal coding sets, I re-ran the analysis while including 13 ordinal categories as opposed to eight. The results showed that, even with more coding sets, the effects of FFs on civilian sexual victimization did not begin to become significant until the FF number reached 500–1,000 which mirrors the results of the original ordinal coding set.
Doctor (2021) and Rebel Foreign Fighter Database (RFFD) comparison
Conclusion
This article introduced the RFFD, its purpose and the data provided. In addition, Doctor’s (2021) analysis of FFs and sexual violence was analysed to demonstrate the contribution of the new data. The results exemplify the contribution that the dataset has to the field of FFs as well as providing several paths forward.
First, these data increase the amount of information available regarding FFs in several ways. These data increase the number of conflicts included and the number of conflicts that include information on FFs. These data provide a 44.44% increase in the number of civil conflicts (45 to 65) and a 68% increase in the conflicts that include information on the number of FFs involved. In addition, because of the disaggregation of FFs into rebel groups, the dataset provides a 526.7% increase in dyads that include FFs that can be analysed. Lastly, these data provide updated coding mechanisms for FFs. The RFFD includes a ‘high,’ ‘low’ and ‘best’ estimate for the number of FFs included along with an ordinal coding mechanism for the ‘best’ estimate. These data allow for the updating of several findings regarding FFs and their effects. Although FFs undoubtedly do have some effect on the conflicts they inhabit, the results show that the relationship is not quite so clear. These results lend themselves to the conclusion that there is a threshold that FFs must reach before they will have a significant impact on the amount of sexual violence perpetrated. For that reason, having a more accurate count of where and to what extent FFs reside is vital to predicting and alleviating where civilian victimization occurs.
Secondly, these data allow for the research on FFs to be expanded and refined. Other scholars researching FFs and their effects have implemented Malet’s FFP (Peeters, 2014; Raagart, 2021). As with the revision of Doctor (2021), the RFFD has the potential to have an impact on many of the conclusions that were drawn using the previous data. While the RFFD provides new results to the field, it also provides several questions about FFs. Why do some conflicts include high levels of FFs but low levels of sexual violence? Vice versa, why do some conflicts include low levels of FFs but high levels of sexual violence? Does the ideology of the FFs matter as opposed to that of the group? How does international law deal with the fact that FFs may have little to no effect on individuals outside of those fighting in some circumstances?
Lastly, while the RFFD increased the amount of information that is available on FFs in civil conflicts, it does have its limitations. First, while the FF data were expanded, there are still major gaps in the information that is available. Conflicts such as Sudan, Ethiopia and Tajikistan had little to no information to be found on the individuals that were members of the rebel groups present. While it is possible that some information has not been found regarding these fighters, it is equally as likely that there is no information available or recorded. For that reason, the scholarly literature on FFs might have to accept some gaps in the numbers available. Secondly, while reliable sources were always sought after, secondary sources were utilized at times to calculate the ordinal code. For that reason, some coding fails to exemplify the reliability that is hoped for. For that reason, the ordinal coding is subject to updates and changes as more information arises. Lastly, because of the lack of substantial data for many of the groups and conflicts included in the RFFD, the data are gathered on solely a cross-sectional basis. Reliable information to detail annual observations for the hundreds of rebel groups were not able to be obtained. In saying that, this same method was utilized by both Malet and Hegghammer’s FF dataset. The sparse reporting on FFs forces scholars to rely on the span of years that the FFs were active in the conflict rather than year-by-year data.
