Abstract
This article presents a novel approach for detecting fraudulent behaviors from automated teller machine (ATM) usage data by analyzing geo-behavioral habits of the customers and describe the use of a fuzzy rule-based system capable of classifying suspicious and non-suspicious financial transactions. Firstly, the geographic entropies of ATM cardholders are computed from the spatio-temporal ATM transactions data to form customer classes of mobility. ATM transactions exhibit spatio-temporal properties by inclusion of location information. The transition data can be generated by using transaction data from the current location to the next one. Once, the transition data are generated, statistical outlier detection techniques can be utilized. On top of classical methods, crisp unsupervised methods can easily be used for detecting outliers in the transition data. In addition, fuzzy C-Means algorithm can be implemented to determine outliers. In this study, ATM usage dataset containing around two years’ worth of data, provided by a mid-size Turkish bank was analyzed. It was shown that a significant bulk of ATM users does not leave the vicinity of their living places. Some insightful business rules that can be extracted from geo-tagged ATM transaction data by means of using a fuzzy rule-based system were also presented.
Introduction
Bank customers across the globe enjoy the flexibility of being able to access their monetary assets whenever and wherever they need as much as the technology allows. But new fraud issues also present themselves and anonymity becomes easier with the new technologies. Ensuring the security of transactions carried out by banks and other financial institutions is one of the major factors affecting the reputation and profitability of such organizations. Customers’ sense of trust and security are fundamental requisites for the banks which typically manage customers’ money and personal information.
As a result of widespread usage of alternative delivery channels in the past few years, losses because of fraud transactions show a dramatic increase and so financial fraud detection and prevention have been receiving increasing attention [6]. Fraud detection activities involve monitoring the behavior of transactions while prevention means a proactive approach that involves the analysis of transactions before they are completed and identifying if they are fraud ornot [6].
Automated teller machines (ATMs), which offer the consumers a quality of life by allowing them to access cash and other financial information, occupy an important position in alternative delivery channels of banking. Since the introduction of the first ATM in 1967, perpetrators have been devising various ways to steal the cash inside the ATMs. According to the report by European ATM Security Team’s (EAST), card skimming, cash trapping and ATM malware incidents generally increased worldwide [5]. Besides, it is reported by Europol and EAST that, as the European Union (EU) banking industry migrates to the Europay, MasterCard and Visa (EMV) environment, losses caused by illegal domestic transactions in the EU have gradually decreased since 2008. However, at the same time, the level of illegal transactions overseas has shown a sharp increase. The United States of America remains the top location for such losses, followed by Indonesia and Thailand [5, 8]. In order to fight with this situation, a short term solution called GeoBlocking, was recommended by Europol and European Central Bank, which limits the possibility of misusing the debit cards in regions without Chip and PIN verification. The implementation of GeoBlocking solution depends on location based static rules and this type of solution has been extremely positive from a security point of view.
Since the main issue with ATM fraud is the misusage of card information, the problem of determining the authenticity of card usage becomes the central point. The main idea of this paper is to show the value of location information for preventing financial fraud specifically against ATMs from a business process management point of view that utilizes fuzzy rules to determine suspicious transactions. There are a few studies on fraud detection that incorporates fuzzy systems [1, 7]. In [2], the authors try to classify suspicious and non-suspicious credit card transactions by using an evolutionary-fuzzy system. They show that the use of evolution with fuzzy logic can enable accurate and intelligible classification of difficult data. In [7], Estevez et al. propose a proactive system for preventing subscription fraud in fixed telecommunications which consist of a classification module and a prediction module. The classification module which is implemented using fuzzy rules, classifies the subscribers according to their previous behavior into four different categories: subscription fraudulent, otherwise fraudulent, insolvent and normal. Another interesting work on fuzzy logic related to fraud detection is in [1] which tries to overcome the fuzziness in the e-banking phishing website assessment via a fuzzy logic model combined with data mining algorithms.
Ref. [11] proposes a solution for detecting card fraud via fuzzy logic followed by Neural-fuzzy Takagi-Sugeno training method. They emphasize four sections of fuzzy system: fuzzy rules, fuzzy-maker unit, decision-making unit and fuzzy-remover unit. In [10], the authors examine the problem of fraudulent financial reporting by application of fuzzy clustering. They point out that fuzzy clustering shows the degree to which (in terms of a percentage) an item “belongs” to a cluster of data and the strength of fuzzy analysis is the ability to model partial categorization of data items e.g. financial transactions.
We assume that the rules which incorporate the location information may readily enhance the performance of fraud detection systems within a BRMS framework. The transition data are generated from transaction data that are sequential in nature. Once the transition data with from-to information are generated, unsupervised methods, such as clustering can be utilized to detect outliers which are later analyzed further to discover the rules for suspicious behaviors. By generating transition data, we also enable an analysis of Markovian type behavior.
The rest of this paper is organized as follows. Section 2 Method summarizes our proposed methodology by using fuzzy rule-based analysis for location based fraud detection. In Section 3, an analysis of ATM usage data to derive location based business rules by using crisp analysis is provided. In Section 4fuzzyrbased, fuzzy methods for finding spatio-temporal outliers are implemented. Our findings and resulting business rules are summarized in Section 5. Finally, our paper is concluded in Section 6.
Methodology
In this section, a summary of our methodology which consists of several steps is given. Our aim is to transform geo-tagged data into a usable format that one can utilize unsupervised methods to detect the outliers which can be used later to derive location based rules.
Computing customer mobility
In order to assess and to summarize the movements of ATM cardholders, one may compute entropy based mobility values. Customers can easily be classified based on their entropies which require somewhat location information from customer movements. The location of an ATM transaction can be pointed by the latitude and longitude of the ATM which are continuous values. To calculate the probabilities, the location information has to be discretized by dividing the map area into some regions in the form of grids. The concept of Shannon entropy is utilized by means of computing following equation;
Customer transactions constitute sequential and geo-tagged data in space and time. Specifically, location and time tags enable us to create a sequence of places where each customer has been in different times. We approach spatio-temporal outlier concept from a perspective of movement. In other words, one can assess whether a customer can be in a location at a specific time or not based on the previous location and the time of being at the previous location. Therefore, it is preferred to construct the transition data which follow Markovian property (i.e. memoryless) by definition from the transaction data. Markovian state space is composed of grids generated in the previous step.
The transition from one location to another one can be characterized by the distance between two locations and the time difference between two consecutive transactions. From these two quantities, one can derive a new measure, namely speed, to summarize the movement. Thus, the speed can indicate an outlier behavior easily together with the distance in consideration. Notice that in the context of ATM transaction data, we can compute above quantities (i.e. distance, time, and speed) between the grids that transactions took placed as well as between the point wise locations of the transactions. Transition probabilities (frequencies) may also indicate outliers. If a movement between two grids occurs very rarely, such a movement may be considered as an outlier.
Determining statistical outliers of transition data
The statistical outliers can be determined easily by observing the distributions in considerations. Specifically, 4σ limits are commonly used for this purpose. Considering that all the quantities mentioned in the previous step are non-negative, any value that is greater than μ + 4σ will be an outlier in our case. For the sake of a better analysis, we may exclude “0” values in our computations without major complications.
Detecting outliers via fuzzy C-Means clustering
Transition data contain movements from one location to another one. They can be assumed to be independent from each other by Markov property. In other words, once the transaction data are transformed into transition data, assumption of Markov property may hold, i.e. current state is independent from the past and the future states. Such transition data may constitute perfect fit for an unsupervised analysis based on fuzzy clustering. In other words, such data can be grouped to detect outlier clusters. In contrast to Step B, distances can be computed between actual ATM locations instead of grid centers. Hence, speed can be calculated more accurately.
Discovering business rules by implementing a fuzzy rule-based system
In this final step, the outliers found in previous steps can be profiled to discover location based rules. It is worth to note that one can easily utilize these rule in a BRMS context to detect and to prevent banking frauds.
Crisp analysis
Data preparation
In order to assess the value of location information, it should be shown that if any pattern exists in the usage of some banking services based on location and if it is possible to derive rules from these patterns or not. For this purpose, the most readily available historical data would be ATM transaction data. Therefore, this study focuses on transaction data generated by ATM customers. In Turkey, ATM cardholders can transfer money, pay their bills and taxes, make mobile minute purchases at ATMs easily besides withdrawing and depositing cash, which are the traditional purposes of using ATMs.
The ATM usage dataset is a real ATM transaction history, a two years period between November, 2012 and November, 2014. It is provided by a mid-size Turkish bank. There are 987,813 distinct ATM cards and 21,678,588 transactions made with them in Turkey are included in this study. The list of all ATMs in Turkey including their location information was provided by the Interbank Card Center (BKM). Having joined the external ATM list data with ATM transactions, we were able to point the locations of the transactions.
There are over 42,000 ATMs all over Turkey. The locations of which are shown in Fig. 1. The dataset does not include the retired ATMs that have been moved from their original places. Therefore, the current list only covers active ATMs and their locations.
Latitude and longitude data have continuous values. In order to make some generalizations, it is worth to discretize the location information. One way of discretizing such data is to create a 2D spatial grid. For this purpose, a 2D spatial grid of 0 . 333° × 0 .333° resolution was generated as seen in Fig. 1. By using this resolution, there were 19 × 57 = 1083 grid squares. Of those squares, 668 are non-empty i.e. there is at least one ATM in each of those grids. Considering ATM transaction data, our data have instances in 660 different grid squares all over Turkey.
Implementing our methodology
As outlined in Section 2, our methodology is composed of several steps that result in applicable location based business rules. In the first step, the entropy value for each customer was calculated. It should be underlined that each customer may have several transactions in various grids on 2D. The probability of having a transaction in grid i, pi, can simply be calculated by dividing the number of transaction in grid i to the number of total transactions by that customer. Once the probabilities are calculated by using the TRANSPOSE procedure of SAS, entropy values for each customer (i.e. ATM cardholder) can be computed. Recall that there are 987,813 cards in our dataset. After entropy computations, it was found that approximately 62.2% of cardholders never had any transactions outside of their grids (i.e. living place). To give an idea about the grid sizes, there are 12 grids covering Istanbul. So grid sizes are not too big to distort the analysis. The distribution of nonzero entropies is depicted in Fig. 2.
An entropy value is an important indicator of customer’s mobility. Basically, they can be used for creating some customer groups. First of all, there is a group of customers who never leave their grids, i.e. zero entropy customers. Two more classes of customers were then created by analyzing Fig. 2. For this purpose, a 0.75 entropy level is used as a cut point. Thus, three distinct customer classes can be formed by customers with 0 entropy, customers with an entropy greater than 0 and less than or equal to 0.75, and finally customers with entropy greater than 0.75.
Note that entropies are computed by using original transaction data by associating them with corresponding grid squares. In terms of the transition data, we need to incorporate from-to information by using either LAG function of SAS or a SQL join. After this step, we ended up with a dataset of 20,690,775 records. Note that the distances can either calculated between exact ATM locations or between corresponding grids for the transition data. Based on the distances and the time differences, the speed values can be computed. Since the corresponding cardholder’s entropy value can be used, we can associate an entropy value for each transition. Therefore a new dataset with ID, Distance, Time, Speed, and Entropy columns can be constructed at this step. The variable Time can be removed from further analysis due to dependency with the variables Distance and Speed. Since most cases have a speed of zero, we exclude those records for finding more plausible outlier limits.
Table 1 gives the summary statistics of variables from the transition data. Since the variable entropy is not directly related to a particular movement, it is not compulsory to use it as an outlier detection measure.
A simple SQL query with “group by” statement can give us frequencies of from-to grid combinations on the transition data where very few occurrences in far distances may indicate the rare events. The most frequent movement occurs within the grid #865, located in Istanbul, with 3,379,016 transitions. From all of those grid combinations, 21,188 of them have frequency 2 or less. After considering the distance outlier limit, 5,933 from-to grid combinations have a frequency 2 or less and a distance greater than 653. These are indeed too many false positives as suspicious movements. It is not practical to accept the rare movements that have also longer distances as outliers.
Using unsupervised methods
In this part of our work, the results from utilizing clustering methods by using FASTCLUS procedure of SAS software are reported. In order to assess the resulting clusters better, the transition dataset was divided into two subsets: the training set was about 40% of the original dataset and it was used for finding the clusters and the remaining data formed the test set and it was used for assessing the clustering results. KMeans and KMedian clustering methods were run on the training set with 10 clusters. Figure 3 depicts cluster centers on two-axes. The variable entropy was excluded as it was not a significant one. As seen in Fig. 3, clusters 2, 4, 5, and 10 are outliers. For KMeans clustering, these clusters have 1, 1, 9, and 3 points in them respectively. For the test set, there are 3, 0, 9, and 5 points that fall into these clusters respectively. For KMedian clustering, there are 1, 1, 8, and 5 points in outlier clusters respectively in training set. For the test set, there are 11 and 9 points in clusters 5 and 10 respectively. A very large majority of points falls into Cluster 1.
Recall that based on entropy, we generated 3 different classes. Moreover, considering statistical outlier limits given in Table 1 on both distance and speed at the same time, training and test sets have 60 and 119 outliers respectively. Frequency Tables of 3 classes vs. 10 clusters are given in Tables 2 and 3 where e indicates the entropy. Recall that clusters 2, 4, 5, and 10 were considered as outlier clusters in our results.
Fuzzy clustering and rule-based analysis
In this section, we explore how to detect outliers via fuzzy models. As a clustering method, fuzzy C-Means algorithm [3] can be used in a straightforward manner to identify the outlier clusters and corresponding memberships to those clusters. Following subsection discusses implementation details of fuzzy C-Means clustering analysis.
Implementing fuzzy C-Means
As in the crisp clustering analysis case, the same two-sets of data were used to examine the findings of fuzzy C-Means clustering. We use MATLAB’s Fuzzy C-Means (fcm) function to run the clustering model. As in the crisp analysis, we aimed finding 10 clusters in our fuzzy C-Means implementation.
Using default options in fcm function, we ended up with a clustering where distribution of cluster assignment of training data based on a membership value greater than or equal to 0.95 is given in Table 4. The summary of crisp cluster assignments of the test data based on cluster center found by fuzzy C-Means algorithm is given in Table 5. These results indicate that fuzzy C-Means clustering algorithm is not prone to outliers and they are smoothed. Therefore, fcm ends up with larger number of points (cases) in each cluster compared to k-means algorithm. In other words, fuzzy C-Means clustering usually does not form clusters out of very few outlier points. So we can conclude that none of the 10 clusters found by fcm is an outlier cluster. Fuzzy C-Means clustering analysis did not return outlier clusters according to crisp assignment due to the superiority of the clustering approach.
However, a rule based fuzzy system can also be implemented to utilize the membership values of the fuzzy cluster assignments. Note that MATLAB’s fcm function returns those cluster membership values for the training set. But the membership values of the cluster assignments based on the distances to the C-Means cluster centers are needed to be computed for the test (unseen) points. The cluster membership value of point i to cluster j, μ
ij
, is computed as
1
:
Considering Fuzzy C-Means cluster centers reported in Table 6, fuzzy memberships to Clusters 2, 8, 9, and 10 can be used to constitute a fuzzy rule based model given in Fig. 4. After computing fuzzy cluster membership values of test data by using Equation 2, fuzzy rule based model can be used to detect fraud cases in test dataset. By analyzing the output of the fuzzy model, it was observed that there was no fuzzy membership cases above 0.9 threshold. However, when the threshold is reduced to 0.86, it was observed that 24,197 cases in the test dataset had a fraud membership value above that particular threshold.
In Section 3, some business rules were discovered by using traditional outlier detection and crisp unsupervised clustering analyses. In this sub-section, we explore pros and cons of implementing a fuzzy rule-based system for detecting spatio-temporal outliers.
A high level model representation of fuzzy model is given in Fig. 5. MATLAB’s Fuzzy Toolbox was utilized for implementing our fuzzy model. MATLAB has an efficient implementation that enables an analyst to model and evaluate fuzzy rules easily even in the case of large datasets like our problem.
The idea behind using a fuzzy model is to examine the superiority of such an approach in reducing the false positives. Fuzzy rule-based systems are successfully implemented in numerous industrial systems. Determining the membership function is the first step in our fuzzy rule-based analysis. We used the same datasets discussed in the previous section. Therefore we have training and test sets in our implementation. We utilized the training set to determine fuzzy membership functions by using fuzzy C-Means algorithm.
Figures 6 and 7 depict such membership functions in our analysis. Fuzzy C-Means algorithm was used to find 5 clusters within single dimension of “Distance” in this setup. The boundaries of those clusters determined the lower and upper end of the membership functions. For this purpose, a membership value of 0.35 is used for “Distance” fuzzy set memberships. Depending on the position of the individual fuzzy set, triangular or trapezoid membership functions are utilized.
Once the fuzzy set membership functions for all dimensions are determined for our dataset, we can then formulate experts’ opinion fuzzy rules in the fuzzy model. There are 28 such rules in our fuzzy rule-based systems. Figures 8 and 9 depict graphical summary of those rules. Figure 8 can effectively shows interaction of “Distance” and “Speed” related rules. Figure 9 shows a graphical summary of all the rules.
In the last stage of the fuzzy rule-based analysis, the test data set and the fuzzy rule set are used for detecting the outliers i.e. the fraudulent behavior. After evaluating the fuzzy rules on the test dataset, the membership values of “Fraud” fuzzy set were found for all points in the test set. We report histogram of points that have a membership value higher or equal to 0.5 for the “Fraud” fuzzy set in Fig. 10.
In light of these results, there are only 127 outliers a threshold of 0.6 membership value of “Fraud” fuzzy set. Therefore, we can conclude that the fuzzy rule-based system does not end up finding too many false positives. Indeed, 127 outliers are more manageable and more realistic than the results found in crisp analysis.
Results
Some of the outliers found in Section 3 are not physically possible. Those values were computed as such, due to the physical changes in ATM locations i.e. machines were moved to different locations. However, our approach resulted in some sound business rules. First of all, customers with entropy 0 have never left their grids. If this type of customers makes a transaction over 75 km away, a flag should be raised. In [4], association mining results showed that most pairwise grid associations occur within a distance of 75km.
We can also devise rules for the rare from-to grid combinations. However, we may extend the distance to 1,000 km. Recall that there are 5,933 such combinations that have distance differences over 653km which is statistical outlier limit. A quick query returned 1,973 such grid combinations over 1,000 km distance differences. We believe that this is also a credible type of location based rules.
Unsupervised clustering based outlier detection may find fewer suspicious behaviors (transactions), but they might have strong likelihood to be suspicious as reported in Section 3. At a 0.9 membership value threshold level, there was no member of Clusters 2, 8, 9, and 10 found by fuzzy C-means algorithm for the test dataset. However, when the threshold was lowered a bit, there were more than 24K cases found to be suspicious (i.e. fraud case) in the test data. Therefore, fuzzy clustering may yield a significant number of false positives, depending on the membership threshold level set. However, on the same test dataset, fuzzy rule-based system was able to report a manageable number of outliers by not reporting too many false positives. This is indeed most desirable property of a fuzzy rule-based system. Figure 11 graphically depicts the 127 points found by the fuzzy model.
Performance evaluation
It should be emphasized that our methodology could be a complementary approach to the existing fraud detection methods and it facilitates a new technical control [9] that utilizes the location information in common banking transactions. To assess the performance of our approach, we can easily compare the performance of our approach to the performance of the existing fraud detection approach used in the bank that has provided the data for this study.
From a performance point of view, the goal of any fraud detection technique is to increase the true positive rate while keeping the false positive and the false negative rates as low as possible. Currently, this particular bank uses a rule-based system to alert its fraud department for the suspicious transactions as soon as they satisfy fraud related business rules. For the record, there was not any known fraudulent activity aimed to the bank customers neither at the bank’s own ATMs nor at the other ATMs in Turkey. However, there were 506 suspicious transactions within the same period of this study. From those transactions, we were able to match 338 transitions in our dataset. In other words, we were able to generate 338 transition data points. A quick analysis revealed that there were only 4 statistical outliers in those transactions. Therefore, we were able to reduce the false positive cases significantly by crisp analysis.
After applying fuzzy rule based approach, only one transition was found to be suspicious. This particular data point has a distance value of 1,217, a speed value of 0.0715, and an entropy value of 0.5004. The fuzzy model assessed that this data point had a membership value of 0.5856.
This particular result shows that fuzzy rule based model is capable of reducing the false positive rate significantly. It is even better than the classical statistical outlier methods.
Conclusion
We introduced a methodology for using a fuzzy rule-based system to detect fraudulent ATM transactions based on location information and derived transition data (such as speed). We showed that coupling entropy values with movement related data can yield valuable information to prevent frauds. Using transition in this respect is crucial. Fuzzy rule-based systems can yield better results as they can find less false positives among outliers.
