Abstract
OBJECTIVES:
The present study aims to extend current research on how offenders’ modus operandi (MO) can be used in crime linkage, by investigating the possibility to automatically estimate offenders’ risk exposure and level of pre-crime preparation for residential burglaries. Such estimations can assist law enforcement agencies when linking crimes into series and thus provide a more comprehensive understanding of offenders and targets, based on the combined knowledge and evidence collected from different crime scenes.
METHODS
: Two criminal profilers manually rated offenders’ risk exposure and level of pre-crime preparation for 50 burglaries each. In an experiment we then analyzed to what extent 16 machine-learning algorithms could generalize both offenders’ risk exposure and preparation scores from the criminal profilers’ ratings onto 15,598 residential burglaries. All included burglaries contain structured and feature-rich crime descriptions which learning algorithms can use to generalize offenders’ risk and preparation scores from.
RESULTS:
Two models created by Naïve Bayes-based algorithms showed best performance with an AUC of 0.79 and 0.77 for estimating offenders’ risk and preparation scores respectively. These algorithms were significantly better than most, but not all, algorithms. Both scores showed promising distinctiveness between linked series, as well as consistency for crimes within series compared to randomly sampled crimes.
CONCLUSIONS:
Estimating offenders’ risk exposure and pre-crime preparation can complement traditional MO characteristics in the crime linkage process. The estimations are also indicative to function for cross-category crimes that otherwise lack comparable MO. Future work could focus on increasing the number of manually rated offenses as well as fine-tuning the Naïve Bayes algorithm to increase its estimation performance.
Keywords
Background
For crime categories that involve serial offenders, i.e. where the same offender commits two or more crimes in the same category, law enforcement agencies strive to link crimes into series [1, 2, 3]. The linking of crimes enable investigators to get a more comprehensive understanding, based on the combined knowledge and evidence collected from the different crime scenes, compared to when investigating each crime in isolation [4]. In addition, such linking also enable more efficient use of the police force’s scarce resources than if investigating each crime individually [1].
The linking of crimes into series can be done based on physical evidence, e.g. DNA or fingerprints. However, such evidence is not available in all crime scenes and is more frequent for certain crime categories than for others [5, 6]. Additionally, the processing of physical evidence is costly and time-consuming. Thus, it is difficult for law enforcement agencies to handle large amounts of physical evidence from high-volume crime categories [7].
For crime categories where physical evidence is not available or simply not practically feasible, “soft evidence” such as the perpetrators’ modus operandi (MO), i.e. habits, techniques and peculiarities of behavior when committing an offense [8], offer a promising avenue for linkage of crime [9, 10]. Such behavioral patterns can be traced from the crime scene and later on interpreted using behavioral and criminal profiling [11, 12]. In behavioral profiling the profiler considers patterns of individuals similarities and differences between offenses (not offenders) by analyzing crime scene behaviors that reflect the offender’s internal workings [13].
Case linkage rests on two key assumptions. First, the offender consistency hypothesis [16], which suggests that offenders display similar behaviors across time and place. Secondly, the offender specificity hypothesis [17] unveil that offenders have an approach that deviate or is distinct from other approaches. These assumptions apply for offender behavior in residential burglary [18], but have also been questioned [19]. However, the accumulated evidence published (in approximately 30 studies) during the last 15–20 years provide a solid base for conducting case linkage based on behavioral consistency and distinctiveness for some offenders [20], some of the time, in various crime categories and types, e.g. commercial and residential burglary [11, 21, 22, 23].
Both of the key assumptions have behavioral underpinnings and focuses on how psychological characteristics generate behavioral patterns [24, 25]. In-depth ethnographic studies suggest burglars to make use of a bounded rational approach. Although not being completely rule following in terms of selecting, breaking, entering and escaping, the offenders deliberation of pros and cons for particular objects unveil systematic behavior [26]. Offenders are further described to develop expertise which ensures a sequencing of behavior which in turn is based on the triggering of certain cues [27]. Thus, offenders typically show some systematic behavior that could be traceable for investigators. Moreover, this line of research is further supported by [21] suggesting individual differences between offender behavior, i.e. behavioral distinctiveness, in terms of their exposure of risk, level of pre-offense preparation, and sensitivity to situational factors. Consequently, the offenders’ risk assessment and level of preparation may be detected as a pattern of either consistency or distinctiveness based on the soft evidence found in detailed reports of residential burglaries.
Case linkage requires crime analysts and offender profilers to manage a substantial amount of information, which can put a heavy cognitive load on the analysts [3, 28]. To counter the fallibility of human cognition, law enforcement agencies have created computerized databases enclosing a large number of reported crimes that analysts can search for similarities [29]. The advantages of computational logic can then be used in combination with such databases in order to calculate similarity scores based on pair-wise comparisons of crimes. Further, learning algorithms from the area of computer science and machine learning can be used for grouping crimes with similar characteristics together, e.g. by using cluster algorithms [15]. Such systems can be packaged as decision-support systems that assist crime analysts in the case linkage process.
Aim and scope
The present study aims to extend current research on how soft evidence can be used in crime linkage by evaluating the possibility to automatically estimate offenders’ risk exposure and level of pre-crime preparation for residential burglaries. If possible, such an estimation could assist law enforcement agencies when linking crimes into series and enable more efficiently use of scarce resources. As the offender risk and preparation scores are aggregated from detailed crime data, they can be included in crime linkage processes to supplement existing MO characteristics, e.g. search strategy and entrance method and location. Also, the supplementing scores discussed in this work can be calculated and compared between crime categories that more or less lack comparable MO behaviors, e.g. burglaries and diesel thefts.
In the study, two criminal profilers individually rate the offenders risk exposure and level of pre-crime preparation for 50 randomly selected residential Swedish burglaries (other crime categories are omitted). Then, a set of learning algorithms is used to generalize from the profilers’ ratings, using feature-rich structured crime scene descriptions. The top performing classifier is then used to estimate the risk and preparation score for a large subset of burglaries, where it is known whether the cases are linked or not. Finding optimal configurations for the studied algorithms is out of scope for the current study as the aim instead is to test the possibility and potential of the concept rather than finding an optimal combination of an algorithm and configuration. Finally, analysis of the risk and preparation consistency of crimes within linked series compared to randomly sampled crimes is conducted.
Related work
Yokota and Watanabe et al. present a computer system for suspect retrieval based on MO similarity in prior offenses [29]. The system includes a database with 107,233 burglaries as well as the MO of 12,468 prior offenders, where each MO is represented as a set of actions collected from crime scenes. Based on the MO calculated similarities, using statistical chance probabilities, it is possible for law enforcement officers to retrieve ranked lists of suspects for crimes under investigation. The performance shows that the hit rate was 20%, i.e. 20% of the offenders were assigned rank 1 for crimes they did commit. The median rank was 29 which can be regarded as successful since a total of 12,468 offenders were included. Offenders with numerous offenses in the database have lower median rank scores, while offenders with a single or a few prior offenses have significantly higher median ranks, i.e. it was hard for the system to correctly retrieve offenders with few prior offenses. The motivation is that offenders develop a preferred MO after achieving a number of successes and failures [8], i.e. inexperienced offenders are associated with significantly larger variance in their MO.
Adderley and Musgrove investigate an approach for analyzing members of criminal networks involvement in burglaries [30]. The approach combines three data mining methods by analyzing spatiotemporal and MO characteristics for 4,159 UK burglaries. The methods included were a multi-layer perceptron, a radial basis function and a Kohonen self organizing map. Although the evaluation lacks comparisons to random samples, the results are promising with improved levels of accuracy of 3–4 times the expected levels from manual analysis by law enforcement officers.
Tonkin and Woodhams empirically study whether spatial, temporal and MO crime scene behavior over different crime categories can be used to support case linkage [20]. Seven hundred and forty-nine solved commercial burglaries and robberies from the UK were grouped into 2,231 linked crime pairs and 273,422 unlinked crime pairs. The results indicate that linked and unlinked crime pairs can be distinguished with a high level of accuracy. A combination of spatial, temporal and MO behavior showed highest accuracy. These results were valid for both offenses in the same crime category as well as for offenses from cross-categories.
In another study, Tonkin et al. tested whether case linkage findings from the UK could be replicated abroad using residential burglaries committed in Finland [31]. Seven measures of similarity (including spatial and temporal) were calculated for both linked and unlinked crime pairs. Logistic regression and receiver operating characteristics (ROC) analysis were used to test the performance of distinguishing between linked and unlinked crimes using the seven measures. The results show that more measures were able to distinguish between linked and unlinked crimes in the Finnish data, compared to the original UK data. The most successful features were spatial proximity, temporal proximity and a combination of target, entry and internal property behaviors.
Wang et al. propose an algorithm for case linkage called Series Finder that grows a pattern of discovered crimes from a seed of a few initial crimes [32]. The algorithm analyzes both spatiotemporal and MO characteristics from crimes. It takes into account both the common characteristics of all patterns and the unique aspects of specific patterns, which result in several positive aspects, e.g. being able to handle dynamic MO that change over time. The study included 4,855 residential burglaries in US, of which two-thirds were used for training the model and the rest for testing. Crime analysts were asked to manually analyze the result for nine series. For these nine series the algorithm correctly suggested 52 crimes that were indeed within the series, together with 6 false hits. Also, 9 additional links were suggested that seemed reasonable for the crime analyst, i.e. interesting crimes that could have been overlooked during the previous crime investigation. It is suggested that Series Finder provides analysts with crime patterns that are previously unknown to them as well as reducing the time spent of manual analysis tasks.
Reich and Porter propose a Bayesian model-based clustering approach based on spatiotemporal and MO characteristics from 11,524 US residential burglary crime scenes [1]. The proposed approach is semi-supervised because the offender is known for a subset of the burglaries. For each crime under investigation the proposed approach calculates the probability of similarity to each cluster. An evaluation using the subset of linked crimes concluded that the worst results were reported when solely relying on spatial data, while improved results were reported for a model including spatial and temporal data. However, a model including MO behaviors together with spatiotemporal data showed best performance and ranked the correct cluster within top 5 60% of the time, and within the top 25 75% of the time. The approach was investigated on residential burglaries but work on other crime types and categories, and possibly also for series containing multiple crime types.
Numerous studies on crime linkage exists, but to the best of our knowledge, no studies that use data-driven machine learning approaches to automatically estimate offenders’ behavioral characteristics has been found. Therefore, in this work we investigate the possibility to estimate offenders’ behavioral characteristics based on crime scene descriptions. In the future it is possible that such estimated characteristics can improve crime analysis and clustering solutions such as the models presented by Reich and Porter above.
Method
In this section, we describe the data used, present our experimental approach, the metrics and the statistical tests used to evaluate the algorithms’ performance at estimating offender behavioral characteristics. In addition, we discuss the choice of algorithms, the algorithm configurations and the feature selection processes used.
Available data
A dataset consisting of feature-rich and structured descriptions for 15,598 burglaries was provided by Swedish law enforcement. Each burglary in the dataset have a detailed crime scene description consisting of 137 features as shown in Table 1. These descriptions include aspects concerning the offender’s location and means of entry, details about the residence and the plaintiff etc. All features were coded in a standardized format according to the coding-scheme described in [14]. This dataset is referred to as the unlabeled dataset.
Summary of the 137 binary features collected from crime scenes
Summary of the 137 binary features collected from crime scenes
Two criminal profilers manually estimated both offenders’ risk exposure and level of pre-crime preparation for 50 burglaries each, with an overlap of 25 crimes to allow for calculation of inter-rater agreement between the two profilers. So, the two profilers were given 75 unique burglaries to rate according to both a risk score and a preparation score. However, during the rating process two burglaries were discarded due to suspected fraud and erroneous information. As a result the second dataset contains 73 burglaries referred to as the labeled dataset.
Finally, the Swedish police provided anonymized information about crime links between burglaries carried out by the same offender. Using this information it was possible to link 153 burglaries from the unlabeled dataset into 41 series. As shown in Table 2 the smallest series contain 2 unique burglaries while the largest contain 14. Based on these linked series it was possible to evaluate the consistency of risk exposure and pre-crime preparation scores per series.
Burglaries grouped into linked series by law enforcement
Because of the limited number of labeled instances, 10-fold cross-validation was used to estimate the performance of the studied learning algorithms on the labeled dataset. The average and standard deviation from 10 different 10-fold cross-validation executions were used to compare each algorithm’s performance.
The experiment was split into two sub-experiments, one investigating the candidate algorithms’ performance in estimating offenders’ risk exposure, and the other for estimating level of pre-crime preparation. In each sub-experiment 10 tasks were executed, of which the first four are listed below.
Evaluate each candidate algorithm’s performance in estimating risk exposure scores and preparation scores on the labeled dataset using 10 times 10-fold cross-validation. Test if significant differences exist between the candidates using a Kruskal-Wallis test. If significant differences exist a Nemenyi post-hoc test is used to determine which algorithms that statistically dominate others. The algorithm that shows the best performance, by means of the metrics in Section 3.5, is chosen to estimate offenders’ risk exposure and level of pre-crime preparation scores for all crimes in the unlabeled dataset.
Next, when all burglaries in the unlabeled dataset have been assigned both a risk exposure score and a pre-crime preparation score, the consistency of both scores within the linked series are evaluated. This evaluation of the scores consistency is investigated for both the crimes in the linked series as well as for pseudo series that consist of randomly sampled crimes. The following six tasks describes the process.
Calculate the pair-wise difference of scores between all unordered pairs of crimes in each series consisting of linked crimes, see Table 3. Calculate the mean and standard deviation of the score differences over all linked series. For each series randomly sample the same number of crimes as detailed in Table 2 out of the 15,598 unlabeled crimes available. This creates 41 pseudo series that each contain the same number of crimes as the linked series. Calculate the pair-wise difference in scores between all unordered pairs of crimes in each one of the pseudo series. Calculate the mean and standard deviation of the score differences over all randomly assigned pseudo series. Determine the difference in score distribution between linked crimes and randomly assigned pseudo series using Wilcoxon’s signed rank test and Cohen’s
Example that shows the calculation of score differences in series, where 0
The multiple statistical tests used in this paper are briefly descibed and motivated below. Cohen’s Kappa is used to determine the agreement between two raters/observers [33]. The test is preferable as the detected agreement is corrected for chance. Further, Cohen’s Kappa works with categorical data [33].
To detect if risk and preparation scores are different between linked and unlinked crimes, Wilcoxon’s signed rank test is used. Wilcoxon’s test is a non-parametric test, equivalent to the t-test [33]. It is used as normality cannot be assumed.
Cohen’s
The Kruskal-Wallis test, also known as the Kruskal-Wallis One-Way Analysis of Variance by Ranks, is a non-parametric test to detect if there exists a significant difference between two or more samples [33]. As the test does not identify between which samples the difference occur, a post-hoc test is necessary. The Nemenyi test is selected as a suitable test for this purpose [34]. It ranks the samples and calculates if the difference in ranks between pairs of samples is greater than the critical difference (which is calculated using a Tukey distribution). If the difference between ranks is greater, a significant difference is found [33, 34].
Included learning algorithms
In order to investigate the potential of using supervised learning algorithms to estimate offenders’ risk exposure and pre-crime preparation we include a diverse population of the 16 supervised learning algorithms shown in Table 4 in the experiment. These algorithms utilize different learning techniques, such as perceptron and kernel functions, instance-based learners, Bayesian learners, rule learners, decision tree inducers and meta-learners. For an introduction to the different learning techniques covered, as well as the specific algorithms included, we refer to [35] and to [36].
The original implementation of each algorithm in the Weka 3.6.6 machine learning workbench was used in the experiment. Additionally, all algorithms were executed with their default configurations. In Table 4 the configuration of each algorithm is shown. No systematic feature tuning was done since the aim of the study is not to find the best performing algorithms but rather to see if it is possible, in general, to estimate offenders’ behavioral characteristics using learning algorithms.
Configuration of each of the investigated 16 learning algorithms
Configuration of each of the investigated 16 learning algorithms
True positive (TP) is an instance that is correctly classified as belonging to category
True negative (TN) is an instance that is correctly classified as not belonging to category
Precision is defined as
Recall is defined as
Accuracy is defined as
False Positive Rate (FPR) is defined as
F-score is defined as
The class distribution is skewed which means that area under the ROC curve (AUC) is a more appropriate metric than traditional accuracy [37]. In this study we use the weighted AUC metric that illustrates the performance of a classifier as its discrimination threshold is varied. AUC is calculated as the area under the curve plotted with the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings.
Experimental environment
Evaluations of the learning algorithms were executed in the
Weka version 3.6.6. URL:
MySQL version 5.5.42. URL:
R version 3.0.3. URL:
An important aspect in developing a machine learning system is to remove redundant and distracting features, also known as attributes, from the dataset as they otherwise can confuse the learning system [35]. Such a process is commonly known as feature selection. Various available feature selection methods in the Weka workbench were evaluated, including CFSubsetEval, ClassifierSubsetEval and WrapperSubsetEval, using 10-fold cross-validation and evaluating on the AUC metric. Out of the attribute selection methods investigated, the CFS algorithm (CFSubsetEval) showed best performance, in terms of AUC, when estimating offenders’ risk exposure. The algorithm uses a correlation-based approach and seeks a subset of features that are highly correlated with the class labels, yet uncorrelated with each other [38]. Out of the 137 features available for each burglary in the dataset, the following eight were recommended by CFS to include in the experiment when estimating offenders risk exposure:
If the residence was located in an urban setting If there were street lights outside the residence If a vehicle was parked on the driveway If kids lived in the residence If the resident was crippled or elder
If an escape route was prepared during the offense The location of entrance, either: door, window, using mail slot or unknown Whether electronics was stolen
It can be concluded that all of the highlighted features are related to the offender’s risk exposure in one way or the other. A discussion of these eight features as well as the ones that were filtered out can be found in Section 6.
When considering the estimation of offenders’ level of pre-crime preparation the WrapperSubsetEval algorithm showed best performance in the feature selection process. The algorithm uses a learning algorithm, in this study we used Naïve Bayes Multinomial with 10-fold cross-validation to estimate the accuracy of the learning algorithm for the subset of features being evaluated [39]. The following eight features were found to best estimate offenders’ level of pre-crime preparations:
If the police had a known suspect directly after the offense If the residence had more than one floor
If the plaintiff was home during the offense If the plaintiff received an unknown phone call just prior to the offense If the plaintiff had a car parked at an airport during the offense If the plaintiff’s tools were used by the offender at the crime scene The method of entrance, either: drilling, breaks, illegal key, smash windows in, unlocked, ventilation position or unknown If alcohol or tobacco was stolen
All highlighted features are related to the offender’s level of pre-crime preparation. A discussion of the features, as well as the ones that were filtered out, is available in Section 6.
In this section the process used by offender profilers when estimating offenders’ risk exposure and level of pre-crime preparation is described and the inter-rating agreement is analyzed.
External raters
Two criminal profilers from the National offender profiling group within the Swedish police were involved in this study. The group consists of six profilers, and the two involved in this study have worked as profilers for 10 and 4 years respectively with education from NLECTC in the US and from Jill Dando Institute of Crime Science in the UK. Their contribution in this study was to estimate both offenders’ risk exposure as well as level of pre-crime preparation for residential burglaries. Both the risk exposure and level of pre-crime preparation were estimated on a four level ordinal scale as follows:
The profilers were told to estimate both the risk exposure and level of pre-crime preparation for 50 residential burglaries each, out of which 25 overlapped to allow for inter-rating analysis. A total of 73 unique burglaries were rated in the study since two were excluded by the profilers, in one case due to fraud rather than burglary, and in the other case due to erroneous information. The two omitted burglaries did not affect the overlap of 25 burglaries rated by both profilers. All estimations by the profilers were carried out individually without being able to affect each other’s decisions. The residential burglaries that were included in the study were randomly sampled from crimes committed in Sweden during 2014 and 2015.
Both profilers had access to the official crime reports and they were allowed to use any additional source of information or software system that could aid them in their decisions. During the task they had access to crime scene photographs as well as aerial and satellite photos of the premises. They were also given access to feature-rich and structured descriptions of the burglaries, each consisting of the previously described 137 features, see Table 1.
Inter-rater agreement
Based on the overlapping 25 burglaries rated by both profilers it was possible to analyze the inter-agreement between the raters. Cohen’s Kappa was used to determine the inter-agreement between the two offender profilers based on the overlapping crimes [40]. Calculating the inter-rater agreement for the estimated offender’s risk exposure resulted in a Kappa value of 0.401 and a p-value of 0.026. Cohen’s Kappa for offender’s level of pre-crime preparation resulted in a Kappa value of 0.595 and a p-value of 0.0025. Landis and Koch provide magnitude guidelines to help interpret Kappa values [41]. According to these guidelines the risk exposure Kappa was moderate, just bordering fair agreement. The level of pre-crime preparation Kappa was moderate, bordering substantial agreement. Thus, we conclude that the raters had sufficient inter-agreement to motivate further work based on the ratings.
Results
This section is divided into three parts, first the results concerning the estimation of offenders’ risk exposure is presented, followed by the results concerning estimating of offenders’ level of pre-crime preparation. Finally, the distribution of the scores and their overall distinctiveness/specificity within series are presented.
Estimating offender’s risk exposure
Results for the models estimating burglars’ risk exposure presented using precision, recall (same as the true positive rate), the false positive rate (FPR), the F-measure and the area under the ROC curve (AUC), with best result per metric in bold font
Results for the models estimating burglars’ risk exposure presented using precision, recall (same as the true positive rate), the false positive rate (FPR), the F-measure and the area under the ROC curve (AUC), with best result per metric in bold font
Table 5 shows the results for the 16 learning algorithms evaluated. As can be seen in the table, the Naïve Bayes Multinomial model reached the highest precision with 0.708, highest recall with 0.719,
Nemenyi post-hoc test for differences in algorithms ability to classify burglars risk exposure according to the AUC metric
lowest false positive rate with 0.126 and the highest true F-measure score with 0.687. The KStar model achieved marginally higher AUC with 0.788. However, since the Naïve Bayes Multinomial showed best results in four out of the five metrics we conclude that it is the most suitable model for estimating offenders’ risk exposure from crime scene data. Other models that reached an AUC above 0.750 were Naïve Bayes Ordinal, SimpleLogistic, SMO and IBk. The Kruskal-Wallis test showed a significant difference between the different algorithms (
Confusion matrix of the estimation of offender’s risk exposure by the Naïve Bayes Multinomial model
Table 7 shows a confusion matrix based on the result of the Naïve Bayes Multinomial model. The diagonal in the table shows the instances correctly estimated by the model, e.g. 9 instances were estimated as High and 19 as Increased. In total, 46 out of 73 (63%) instances were correctly estimated, while 27% and 9.6% of the instances were predicted
Histogram of differences in offenders’ risk exposure estimated by the Naïve Bayes Multinomial model using pair-wise comparisons for crimes within either (a) linked series committed by the same offender, or (b) randomly sampled crimes in pseudo series committed by different offenders.
Offenders’ risk exposure were also calculated for crimes included in linked series, based on estimations of the Naïve Bayes Multinomial model. In Fig. 1 all risk exposure differences from pair-wise crime comparisons within series are presented for both linked series and randomly selected crimes. From the histograms it is clear that linked series have considerably more crime-pairs with identical risk exposure (i.e. a difference of 0) within the series compared to the randomly selected crimes, in fact 182 compared to 109. Although linked series have more crime-pairs with a difference of 1, the randomly selected series have considerably more crime-pairs with a difference of both 2 and 3. The total risk difference for linked series was 319 compared to 577 for the randomly selected series. To evaluate if any statistical difference existed between the two types of crime pairs, Wilcoxon’s signed rank test was used. The Wilcoxon’s signed-rank test showed that linked crime pairs are statistically significant different from the random pairs, i.e. linked crime pairs have a increased inter-crime similarity with regards to risk exposure (
It can be concluded that the offender’s risk exposure is more consistent for crimes they have committed, i.e. within linked series, compared to randomly sampled crimes committed by various offenders. This suggests that estimated risk exposure for offenders can aid law enforcement in the case linkage process.
Results for the models estimates of burglars’ pre-crime preparation presented using precision, recall (same as the true positive rate), the false positive rate (FPR), the F-measure and the area under the ROC curve (AUC), with best result per metric in bold font
Results for the models estimates of burglars’ pre-crime preparation presented using precision, recall (same as the true positive rate), the false positive rate (FPR), the F-measure and the area under the ROC curve (AUC), with best result per metric in bold font
As can be seen in Table 8, the Naïve Bayes Ordinal model reached the highest precision with 0.58 and AUC with 0.77 when estimating offenders’ level of pre-crime preparation. RBFNetwork showed best false positive rate with 0.35, while Naïve Bayes Multinomial showed highest recall at 0.70 and best F-measure score with 0.62. Even if both the Naïve Bayes Multinomial model and the ordinal version performed very similar, we choose the Naïve Bayes Multinomial model as the best candidate for simplicity reasons, i.e. the Naïve Bayes model is used to estimate both offenders’ risk exposure and level of pre-crime preparation. No other models achieved an AUC above 0.75, RandomForest was the closest at 0.69. The Kruskal-Wallis test detected a statistical significant difference between the algorithms for the AUC (
Table 10 shows a confusion matrix based on the result of the Naïve Bayes model. The diagonal in the table shows the instances correctly estimated by the model. It is clear that the Increased class dominates the other classes with 43 out of the total 73 instances. In total the Naïve Bayes model correctly estimated 51 out of 73 (70%) instances, while 26% and 4% of the instances were predicted
Offenders’ level of pre-crime preparation was also calculated for crimes included in linked series by using the same method as for offenders’ risk exposure. In Fig. 2 all pre-crime preparation differences from the pair-wise crime comparisons within series are presented for both linked series and randomly selected crimes in pseudo series. From the histograms it is clear that linked series have considerably more crime-pairs with identical pre-crime preparation values (i.e. a difference of 0) within the series compared to the randomly selected crimes, in fact 344 compared to 211. Randomly selected crimes in pseudo series have considerably more crime-pairs with a difference of both 1 and 2, while linked series have slightly higher number of pairs with differences of 3. However, the difference in numbers between random and linked series for differences of 3 is negligible. The total pre-crime preparation difference for linked series was 128 compared to 357 for the randomly selected series, i.e. difference with a factor of 2.8.
To evaluate if any statistical difference existed between the two types of crime pairs, Wilcoxon’s signed rank test is used. The Wilcoxon’s signed-rank test showed that linked crime pairs suggest statistically
Nemenyi post-hoc test for differences in algorithms ability to classify burglars level of pre-crime preparation according to the AUC metric
significant more similar levels of pre-crime preparations than unlinked crime pairs (
Confusion matrix of the estimation of offender’s level of pre-crime preparation by the Naïve Bayes model
Histogram of differences from pair-wise comparisons of offenders’ level of pre-crime preparation for crimes within either (a) linked series committed by the same offender, or (b) randomly sampled crimes in pseudo series committed by different offenders.
Distribution of offenders’ risk and preparation scores from the Naïve Bayes model within (a) the labeled dataset, (b) the linked, i.e. within series, dataset and (c) the unlinked dataset that was rated by the profilers
It can be concluded that the offender’s level of pre-crime preparation is more consistent for crimes they have committed, i.e. within linked series, compared to crime series consisting of randomly sampled crimes committed by various offenders. This indicates that estimated pre-crime preparation scores for offenders can aid law enforcement in the case linkage process.
Predictive models that use crime scene data to estimate offenders’ behavioral characteristics could be useful for law enforcement in the crime linkage process. For a burglary that is investigated such models would allow crime analysts to record the crime scene data, feed the data into a predictive model and get estimates of the offender’s behavioral characteristics. Previous work, as well as the results in the present study, indicate that offenders’ behavioral characteristics show both intra-series consistency as well as inter-series distinctiveness. For the characteristics analyzed in the present study, this mean that offenders will show similar degree of risk exposure and pre-crime preparation for the crimes they perform and that these characteristics differ, to various extent, when compared with other offenders.
To benefit from these behavioral characteristics in the crime linkage process the crime analysts should compare crimes with regards to the combination of characteristics. We have studied two behavioral characteristics that would render pair-wise combinations of characteristics, e.g. low risk exposure and high degree of pre-crime preparation. By considering the pair-wise combination of characteristics for crime(s) being investigated it is possible to find other crimes that show similar pairs of characteristics. However, it is also possible to exclude crimes with different offender characteristics as they not likely have the same offender as the burglary being investigated. In Table 11 the distributions for the different pair-wise combinations of both behavioral characteristic scores estimated by the models are shown for the crimes in the labeled, linked and unlinked datasets.
Risk and preparation scores for crimes within a sample of linked series to show the consistency of scores within series and distinction of scores between series
Risk and preparation scores for crimes within a sample of linked series to show the consistency of scores within series and distinction of scores between series
Distribution of offenders’ risk and preparation scores from the Naïve Bayes Multinomial model within (a) the unlinked dataset, (b) the linked, within series, dataset and (c) the labeled dataset.
The risk exposure scores for the crimes in the labeled dataset is divided over all four levels, but with an increment in the increase level that holds 40% of the crimes. There is also a slight reduction in the high and low levels with 19% and 13% of the crimes respectively. The pre-crime preparation scores for the crimes in the labeled dataset are dominated by the decreased level as it holds 60% of the crimes. Since the crimes in the labeled dataset were randomly sampled it indicates that the decreased level of the preparation score is also dominating in the burglary population as a whole, i.e. a large proportion of the offenders do not engage in much pre-crime preparation. As shown in Table 11 63% of the crimes in the linked dataset as well as 74% of the crimes in the unlinked dataset were estimated with decreased preparation scores. As such, the model seems capable of distinguishing features for the different categories. This shows the potential to distinguish other types of offenders, that do prepare their burglaries to a larger extent, from the large group of more impulsive offenders, allowing law enforcement to reduce the group of potential offenders for the investigates crime(s). An ideal model would have high distinctiveness between categories, and high consistency within categories.
In the linked dataset the increased risk exposure level is estimated for 50% of the crimes while the high level only is estimated for 3% of the crimes. A hypothesis is that law enforcement apprehend more offenders that carry out burglaries with increased risk than for low or decreased risks. On the other hand, not many offenders carrying out high risk crimes are apprehended, but this could be because a clear majority of the offenders reject such crime opportunities in favor for less risky.
For better overview of the different combinations of score-pairs Fig. 3 shows the distribution of score-pairs in a jitter plot. In the plot the preparation score is dominated by the decrease level, as well as the the heightening of crimes with an increased risk score. However, Fig. 3 mainly shows how the 16 possible score-combinations are distributed for each of the three datasets. The over-representation of burglaries with decreased preparation scores can clearly be seen. It is also clear that most of the different combinations are populated by burglaries from all three datasets, which is a prerequisite for having distinctiveness between series.
In Table 12 the risk and preparation scores are shown for the crimes in six linked series. Combinations of scores show consistency for crimes within series, although there are some fluctuations of
Using the models investigated in this study, law enforcement agencies could benefit from a decision-support system that automatically estimates offenders’ risk exposure and pre-crime preparation scores for all crimes as they are being reported. This may, in turn, help to further profile offenders, perhaps even reduce number of suspects for particular crimes or be other means for operational law enforcement. However, it is necessary for these models to have access to an extensive dataset of features from the crime scene when estimating the scores. In a previous study we describe a method for collecting structured crime scene data using a digital click-based form that is filled in by law enforcement officers at the crime scene [43]. The forms themselves could be translated and implemented in other countries to aid cross-border intelligence. The methodology is currently being implemented by the Swedish police for a number of volume crime categories. In that implementation the content of the forms will momentarily be available for search and analysis as soon as they are filled in by officers on the crime scene. Based on the content of such forms the models investigated in this study could automatically estimate scores for risk exposure and preparation level for the sought offenders.
Since the experiment results indicate that the consistency of both risk exposure scores and preparation scores are significantly higher for crimes in linked series compared to crimes in general, we argue that these scores can benefit crime analysts in their crime linkage tasks. When linking crimes, the analysts could use the scores as one feature, out of many, when identifying possibly related crimes. The general idea is that given a single crime (or a set of several crimes) that are investigated, the analysts should focus on other crimes where the offender has comparable risk and preparation scores. Both scores could be combined to produce score-combinations ranging from offenders with high risk exposure and low level of preparation to combinations with low risk exposure and high level of preparation. Including a spectrum of outcomes in between, which in this initial study was reduced since grading scales with only four levels were used. In addition to the scores, traditional features used when linking crimes should still be used, e.g. spatio-temporal proximity, MO characteristics, search strategy for goods as well as target selection criteria. The scores should be seen as complementary features in the linkage process rather than the only ones.
Ruleset from the PART model estimating offenders’ risk exposure.
If (urban = No) and
(vehicle_on_driveway = No) Then
risk := Low
Elsif (crippled_or_elderly = No) and
(escape_route_prepared = No) and
(entrance_place = Door) and
(kids_at_home = No) and
(street_lighting = Yes) Then
risk := High
Elsif (electronics_stolen = Yes) and
(kids_at_home = No) Then
risk := High
Elsif (escape_route_prepared = No) and
(crippled_or_elderly = Yes)
risk := Decreased
Elsif (escape_route_prepared No) and
(entrance_place = Door) and
(kids_at_home = No) Then
risk := Decreased
Elsif (escape_route_prepared = No)
risk := Increased
Else
risk := Decreased
Ruleset from the PART model estimating offenders’ risk exposure.
If (urban = No) and
(vehicle_on_driveway = No) Then
risk := Low
Elsif (crippled_or_elderly = No) and
(escape_route_prepared = No) and
(entrance_place = Door) and
(kids_at_home = No) and
(street_lighting = Yes) Then
risk := High
Elsif (electronics_stolen = Yes) and
(kids_at_home = No) Then
risk := High
Elsif (escape_route_prepared = No) and
(crippled_or_elderly = Yes)
risk := Decreased
Elsif (escape_route_prepared No) and
(entrance_place = Door) and
(kids_at_home = No) Then
risk := Decreased
Elsif (escape_route_prepared = No)
risk := Increased
Else
risk := Decreased
In the feature selection process, described in Section 3.7, eight features were found to best correlate with the risk exposure scores. Some of these were related to increased risk of being observed or exposed, e.g. whether a residence was located in an urban setting, whether there was a car parked on the driveway, if kids lived in the residence or if there were street lights outside the residence. Other features were associated with decreased risk, e.g. if resident was crippled or elder or if an escape route was prepared. The location of entry mainly associated with high/increased risk were door and window, but they could also indicate decreased risk together with other features. Interestingly, no features regarding burglary alarm or whether the residence were included in neighborhood watch was identified as good indicators of risk.
Two of the models (one for predicting risk-exposure scores and another for pre-crime preparation scores) are presented here in order to visualize how such models actually look like. We hope this could concretize how the proposed approach of predicting offender characteristic scores based on crime scene data work in practice. In Listing 1 the ruleset of the model generated by the PART algorithm is shown. We chose to show a representation of the PART model rather than the Naïve Bayes Multinomial model since the prior is more transparent and can more easily be interpreted by humans, although the decision logic differs from the Naïve Bayes Multinomial model. Also, the model created by the PART algorithm was selected over for instance the model from the tree-based J48 algorithm, as the former reached higher AUC scores.
The feature selection process concerning pre-crime preparation also identified eight features that best correlate with the preparation scores. Some of the features were associated with increased/high levels of preparation, e.g. if there was no suspect to the crime when police officers filed the crime report or whether the location of entry was through a door. Other features were associated with decreased/low levels of pre-crime preparation, e.g. if the plaintiff was home during the offense or if the plaintiff’s own tools were used by the offender during the offense. Methods of entry mainly associated with increase/high preparation levels were drilling the lock, breaking door and the use of an illegal key, while smashing a window were associated with decreased/low levels.
Ruleset from the Ridor model estimating offenders’ level of pre-crime preparation.
preparation = Low
If (entrance_method = Other) Then
preparation = Decreased
Elsif (suspect_exists = No) Then
preparation = Increased
If (suspect_exists = No) Then
preparation = Increased
Elsif (multilvl = No) Then
preparation = Increased
Elsif (entrance_method = Illegal key) Then
preparation = High
If (suspect_exists = N/A) Then
preparation = High
Elsif (multilvl = Yes) Then
preparation = Increased
The PART algorithm produces a trivial model with a single rule predicting all instances according to either the majority category or the 2
Based on the results from the experiment it is indicated that Naïve Bayes Multinomial generates models with best classification performance on the problems at hand with AUC measures of 0.79 and 0.77 for estimation of risk and preparation scores respectively. Since the aim of this study was to test the feasibility of estimating risk and preparation scores using predictive models, no structured configuration-tuning was carried out and all evaluated algorithms used default out-of-the-box configurations. Also, no methods for evening out the number of burglaries that are assigned to each level of the risk and preparation scores have been investigated. For instance, using traditional over-sampling or the more advanced Synthetic Minority Over-sampling Technique (SMOTE) for levels with few burglaries, e.g. the low and high preparation scores [44]. Thus, it is probably possible for the models to reach better estimation performance after such opportunities have been investigated further.
The time it takes to train and use the models should not pose any problem in a practical situation. Training the Naïve Bayes Multinomial model on the labeled dataset took a couple of seconds. Using the model to estimate the risk and preparation scores for the 15,598 unlabeled burglaries took about 2 minutes on an ordinary workstation computer.
There is no correlation between the risk exposure scores and pre-crime preparation scores calculated for the unlabeled burglaries since both Spearman’s and Pearson’s correlation coefficients are 0.11. However, in an ongoing parallel study we investigate how these scores are affected by both spatio-temporal and socio-economic factors.
In the present study we set out to test the possibility to estimate offenders’ risk and preparation scores using manual ratings from criminal profilers. However, before knowing whether the estimation of the scores were really possible the profilers could only invest a limited amount of time. In dialog with the profilers it was decided that two profilers should rate the risk and preparation scores for 50 burglaries each. It was also decided that the rating scales should have four levels, i.e.
This study investigates the possibility to estimate burglars’ risk exposure and level of pre-crime preparation using predictive models generated by machine-learning algorithms. Two criminal profilers manually rated 50 burglaries each with regards to the offender’s risk exposure and level of pre-crime preparation scores. In an experiment 16 machine-learning algorithms were evaluated on their performance in estimating the two scores based on feature-rich structured crime scene descriptions. The Naïve Bayes Multinomial algorithm generated the best performing models with an AUC metric of 0.79 for estimating risk exposure and 0.77 for level of pre-crime preparation. Using the Naïve Bayes Multinomial model the risk and preparation scores were estimated for an additional 15,598 burglaries.
By analyzing the scores for both linked burglaries (performed by the same offender) as well as for unlinked burglaries (performed by different offenders), it was shown that the consistency for both risk and preparation scores were significantly more consistent for crimes within series committed by the same offender. In addition, it was shown that there exists some degree of distinctiveness between crimes in different series. Thus, indicating that the scores add valuable input to the crime linkage process, e.g. when linking large numbers of volume crimes. It is possible that these scores can be used when linking crimes from different crime categories without easily comparable MO, since they rather reflect behavioral characteristics of the offender than specific MO characteristics at offenses.
Future work
In future studies we will investigate how the risk and preparation scores are affected by spatio-temporal and socio-economic factors. We also plan to extend the present study by increasing the number of offenses manually labeled by criminal profilers, to provide a larger training dataset, as well as using a more fine-grained rating scale. It would also be interesting to investigate the consistency and distinctiveness in risk and preparation scores for crime series consisting of cross-category crimes, e.g. both burglaries, diesel thefts and crimes against elders. Additionally, as the present study is indicating that Naïve Bayes Multinomial generates suitable models, a future study could target only those models and instead focus resources on finding the optimal configurations and feature selection to reach best performance. Possibly also investigating methods to handle the class-imbalance between the different levels. Finally, it could be interesting to consider the proposed approach in alternative contexts, e.g. various types of fraud.
Footnotes
Acknowledgments
We would like to extend our gratitude to the National offender profiling group within the Swedish police for their assistance. This work was carried out with financial support from Vinnova, the Swedish Innovation Agency, through research grant 2015-05977.
