Abstract
The main purpose of this article is to analyze the survival capacity of mutual funds based on their characteristics. The methodology used to meet this objective is Self-Organizing Maps (SOM), a type of Artificial Neural Network that allows patterns to be grouped together according to their similarity, enabling us to classify mutual funds into surviving funds and disappeared funds. We consider the following variables: age, size, investment flows, performance, volatility, and Morningstar rating. We use a sample of 1,617 Spanish mutual funds, of which 943 had disappeared between 2004 and 2016. The results obtained indicate that SOM accurately classifies 80% of mutual funds. Therefore, SOMs are effective instruments for classifying the disappearance of Spanish mutual funds and the variables used to define them can explain their survival capacity.
Keywords
Introduction
Between 2013 and 2016, the size of mutual funds in Spain increased by 53% to become one of the main investment instruments, registering a total of 394,219 million euros at the end of 2016.
The main purpose of this article is to examine Spanish mutual funds using a methodology based on artificial neural networks, the Self-Organizing Maps (SOM) developed by Kohonen, to see if they can be clustered into surviving funds and disappeared funds, having defined them using the following variables: age, size, investment flows, performance, volatility, and Morningstar rating. The first five variables are the typical ones in the financial literature. We propose the introduction of a new variable, the Morningstar rating, which is considered as one of the most popular rankings among investors given that the information it provides has a negative or positive impact on fund inflows and outflows [1, 16].
The network provides a spatial distribution of the funds in a bidimensional map from which the accuracy percentage (or the error percentage) when classifying funds into alive and disappeared funds is calculated. It also enables the relationship between the variables introduced and the mortality and survival of funds to be indirectly tested.
SOMs have been used in numerous studies in the fields of finance, marketing, and administration, among others. The main contributions are related to predicting business failure [9, 42], bankruptcy [6, 36], financial assets ratings [28, 49], identifying market segments [22, 46], and analysing financial markets [15, 45].
[39] uses SOMs to determine if the CNMV classification of mutual funds according to their investment objective is accurate. Other authors have focused on predicting the net asset value [13] or future fund performance [24, 48] using another type of artificial neural network, the back-propagation network.
As far as we know, there are no previous studies on the survival of mutual funds using this methodology and so the focus proposed in this paper is entirely unprecedented in the financial literature. If the map obtained presents a high accuracy percentage, it will indicate that neural networks may be suitable instruments for classifying a mutual fund as a disappeared or a surviving fund.
This paper is structured as follows. The literature on the factors that influence the disappearance of mutual funds is reviewed in section two. The methodology used is presented in section three. The data and their sources are examined in section four and the application is developed in section five. The conclusions are drawn in the last section.
Antecedents
Regarding the variables related to the disappearance of mutual funds analyzed in the literature, some studies consider the size of the fund to be a key factor, concluding that smaller funds are less likely to survive [2–4, 51].
Other authors focus on analyzing investment flows, finding that there is a positive relationship between the mortality rate and capital outflow before the fund disappears [2, 51]. Paper [17] adds that the oldest funds have capital outflows during the three years prior to their disappearance, while younger funds attract capital right up to the last year, which is when investors begin to withdraw capital more intensely.
Other studies have found that the likelihood of a fund disappearing is inversely related to the age of the fund and so younger funds are more likely to disappear [8, 51].
Another variable considered is the return obtained in the years prior to closure. Some studies empirically demonstrate that obtaining negative results increases the likelihood of the fund disappearing and intensifies investment outflows [3, 51].
The volatility of the fund is another variable that explains mutual fund mortality [12, 38]. These authors conclude that higher risk funds are more likely to disappear. [4], on the other hand, affirms that risk cannot be a key factor in the survival of funds because higher risk does not imply better or worse performance.
Another key factor is fund expenses. Funds with more expenses are more likely to disappear [7, 41].
Self-organizing maps (SOM)
We use Self-Organizing Maps (SOM), a kind of Artificial Neural Network (ANN) developed by Teuvo Kohonen in 1982 [30], to meet the objective of this paper.
The main ANN classification distinguishes between networks with supervised learning (an external agent gives known outputs to the network to train it), and networks with unsupervised learning (which use the internal properties of the input data to organize the information depending on their similarities). SOMs belong to the second category and they especially use competitive learning.
SOMs reduce the dimension of the input information, which is initially n-dimensional, to produce a two-dimensional map, while preserving its topology. Therefore, similar patterns will be placed in cells that are closer on the map, while patterns with different characteristics will be placed in more distant cells [31].
Regarding the structure of the network, SOMs are composed of two layers. The input layer has n neurons, one for each variable that defines the patterns to be analyzed and is responsible for receiving information coming from outside and transmitting it to the output layer. The output layer has m neurons organized into a two-dimensional map, which is usually rectangular or hexagonal. The number of output neurons depends on the size of the problem. In our case, we consider the rule
Let
The input neurons i are connected to the output neurons k by means of initially random weights w ki , W ki = (w1i, w2i, . . . , w mi ) being the vector of weights that connect the input neuron i with the output neuron k.
The competition proper of the SOMs learning implies that when an input is introduced to the net all the output neurons compete to be erected as the representative of the input pattern. SOMs use the distance between the input vectors and the weights (other metrics can be used but the most usual one is the Euclidean distance) to determine which one is the winner neuron (BMU, or Best-Matching Unit).
To this effect, the output neuron k* is the BMU if the following is satisfied:
According to the competitive process, once the BMU has been determined, the network modifies the weights of the BMU and their neighbouring neurons. The objective of this modification is to make the same neuron k* (or another neuron next to it) the winner neuron when the same pattern (or a similar one) is presented in the network.
SOMs can work with different neighbourhood functions. We have used the Gaussian function to implement the network, which is to say,
The weights are then modified using the following expression:
Once the network locates the patterns in the output layer, their position must be interpreted. To do so, SOMs are assisted by the component planes, one for each input variable. Each component plane is the same size as the output map and is accompanied by an indexed coloring for the units, where red represents the highest values of the variable and dark blue is used for the lowest values.
SOMs can be used for different applications. However, they are mostly used for interpreting data, identifying objects, grouping data so that the processing system can classify them according to certain variables, and even for reducing the dimension [15].
We used two databases: the National Securities Market Commission (CNMV) and Morningstar Direct. The first provided the funds that had disappeared during the analyzed period, while the second gave the value of the variables used in this study.
We used a sample of 1,617 mutual funds, which corresponds to all the funds alive at the end of 2003 in the Spanish market, plus all the funds that were registered between then and 2016 for which all the required variables were available. A total of 943 funds had disappeared from the market, representing 58% of the sample.
Table 1 describes the variables considered in this paper.
Initially studied variables
Initially studied variables
SOMs require that the input data have low correlations because if not the network works with overweighted information which could condition the result. Table 2 shows the correlation matrix between the variables.
Correlation matrix
Based on the correlation matrix, the variables ‘Variation in size_1 yr’ (variable 3) and ‘Annual standard deviation’ (variable 7) were eliminated because of their high correlation with the variables ‘Variation in size_2 yrs’ (variable 4) and ‘Annualised standard deviation_3 yrs’ (variable 8), respectively. The rest of the variables were maintained to carry out the study.
The final list of variables is shown in Table 3.
Variables used to define the funds in the SOM
The selected variables were standardized. This process is important given that the input variables are measured on very different scales. The sample of mutual funds, defined by the value of these variables, were then introduced into the network.
Once the network is implemented, it generates the output map (its dimension is r x s neurons) where the funds are located. The ratio of disappeared funds of an output neuron j, Q
j
, is calculated using the following expression:
When grouping funds, the network can commit two errors: Type I error: when the network places a disappeared fund in a neuron with Q
j
< 0.5; in other words, in a surviving fund neuron. Type II error: when the network places a surviving fund in a neuron with Q
j
> 0.5; in other words, in a disappeared fund neuron.
When the network is implemented it generates an output map of dimensions 16x12 (16 rows and 12 columns).
The funds that are most similar are placed in the same area of the map and from this we can discern whether the network is capable of grouping surviving and disappeared mutual funds separately.
Table 4 shows the error percentage of each type, the resulting total error, and the accuracy percentage.
Error percentage
Error percentage
Figure 1 shows the value Q j of each output layer neuron, defined in Equation 5. The color red is used to identify the neurons with Q j ≥0.5; in other words, those that contain predominantly disappeared funds. The color blue is used if there are predominantly surviving funds in the neuron; in other words, if Q j < 0.5. The units of the map where no funds have been placed are left white.

Value Q j of each neuron in the SOM.
As can be seen, the SOM correctly classifies over 80% of the mutual funds, leading us to confirm that the variables used define the survival of mutual funds in Spain.
The percentage of type I errors can be seen to be less than the type II errors, reinforcing the suitability of SOMs for classifying mutual funds. The cost of incurring type I is greater than type II because the risk of losing money through investing in a disappeared fund identified as a surviving fund is higher. It must be noted that the difference between the two errors is minimal.
Once we have confirmed the reliability of neural networks when classifying mutual funds, we observe that the groups located at the bottom and to the right mostly contain surviving funds, while those placed at the top and on the left are mainly clusters of disappeared funds.
The values of all the variables in the area corresponding to where the fund was located were evaluated (Fig. 2) to interpret the position of a mutual fund in the map. These values are represented using a color scale, where the highest values of each variable are shown in red and the minimum values in blue.

Component Maps.
To facilitate the visualization of the variables and determine which variable(s) influence the disappearance of Spanish mutual funds, Fig. 3 shows the map with each Q j using the same scale of colors as in Fig. 2. We can observe that no variable defines the survival capacity separately. Therefore, the results confirm that it is the behavior of all the variables (age, size, return, volatility, and Morningstar rating) that affect fund survival.

Level of Q j .
A sample of 1,617 mutual funds from the Spanish market, of which 943 had disappeared during the period 2013–2016, was used to test the adequacy of a methodology based on artificial neural networks to classify the funds according to whether they had disappeared or survived.
Seven variables for each of the funds were considered to train the network: age, size, variation in size at 2 years, annual return, annualized return at 3 years, the annualized standard deviation at 3 years, and Morningstar rating.
The SOM correctly classifies 80% of the mutual funds; in other words, it places 80% of the funds in the correct neuron, either the one with predominantly surviving funds or the one with predominantly disappeared funds. From these results, two conclusions can be drawn: i) SOMs are useful instruments for classifying the disappearance of mutual funds from the Spanish market and thus for predicting their mortality; and ii) the variables used characterize this disappearance accurately and there is no particular viable that defines the survival capacity separately.
