Abstract
The multi-target regression problem comprises the prediction of multiple continuous variables at the same time using a common set of input variables, and in the last few years, this problem has gained an increasing attention due to the broad range of real-world applications that can be analyzed under this framework. The complexity of the multi-target regression problem is higher than the single-target regression one since target variables often have statistical dependencies, and these dependencies should be correctly exploited in order to effectively solve this problem. Consequently, additional difficulties appear when the aim is to perform a selection of instances on this type of data. In this work, an ensemble-based method to perform the instance selection task in multi-target regression problems is proposed. First, a well-known instance selection method is adapted to directly work with multi-target data. Second, the proposed ensemble-based approach uses a set of these adapted methods to select the final subset of instances. The members of the ensemble select partial data subsets, where each member is performed on a different input space that is expanded with target variables, exploiting therefore the underlying inter-target dependencies. Finally, the ensemble-based method aggregates all the selected partial data subsets into a final subset of relevant instances by means of solving an optimization problem with a simple greedy heuristic. The experimental study carried out on 18 datasets shows the effectiveness of our proposal for selecting instances in the multi-target regression problem. Results demonstrate that the size of datasets is considerably reduced, whilst the predictive performance of the multi-target regressors is maintained or even improved. Also, it is observed that the proposed method is robust to the presence of noise in data.
Introduction
In the past few years, the scientific community has paid an increasing attention to problems that comprise the prediction of multiple outputs simultaneously, mainly due to the many real-world applications that are possible to study within this framework [1, 2, 3, 4, 5]. Multi-target regression (henceforth MTR) is one of these problems, and it comprises the prediction of multiple continuous variables from a common set of input variables [6]. In other words, MTR algorithms aim to learn a predictive model that, given an unseen input vector
Up to date, many methods have been proposed to tackle the MTR problem, and these can be organized into problem transformation and algorithm adaptation methods [10]. Problem transformation methods decompose an MTR problem into several single-target regression tasks. Recent researches have focused on applying some well-known multi-label learning transformation methods to solve the MTR problem, mainly motivated by the tight connection between these two learning paradigms.1 In this regard, Spyromitros-Xioufis et al. [6] demonstrated that several multi-label approaches, such as the binary relevance [11], stacked generalization [12] and classifier chains [13], are straightforward to adapt to the MTR problem. On the other side, the algorithm adaptation category comprises algorithms that do not decompose an MTR problem into several single-target regression tasks; i.e. they directly handle the multi-target data. In this category, many methods have been proposed, such as statistical techniques [14], support vector machines [15], kernel-based approaches [16], MTR trees [17], rule-based methods [18], and locally weighted regression methods [4].
Spyromitros-Xioufis et al. [6], Melki et al. [15], Reyes et al. [4] and many other authors have demonstrated that the MTR problem can be solved more effectively if the inter-target correlations are detected and exploited. However, the major challenges of MTR lie in how to model such inter-target dependencies correctly, and how to estimate the nonlinear relationships that may exist between the input and output spaces of the problem [19]. On the other hand, when the MTR problem is studied not all available training samples are useful to construct an accurate predictive model; it is well-known that noisy, redundant and incomplete data can significantly deteriorate the performance of the most learning algorithms [20]. Consequently, the acquisition of a high-quality and compact dataset, from which an algorithm can learn relevant data relationships, is also an important issue to be considered when tackling the MTR problem.
The instance selection task (henceforth IS) is an important data preprocessing step, that aims to select a representative subset of an original dataset by filtering noisy and redundant data, in such a manner that the predictive performance of the learner that was induced from the data subset would be the same (even better) as if the original dataset was used [21]. Nowadays, these algorithms can bring many benefits to the scientific community mainly due to their applications to the Big Data challenge [22]. The IS task has been widely studied for the classification problem (see, for instance, the Olvera-López et al. [21] and García et al. [23] works), however, this task for the regression problem has been far less studied [24]. The IS task in the regression problem has some difficulties that do not exist in the classification task. For instance, several IS methods assess the relevance of an instance by means of measuring its usefulness in predicting the correct classes of its nearest neighbors. However, the concept of data class in the regression problem does not exist since the domain of the output variables is continuous. On the other hand, the identification of class boundaries, an important criterion on which many IS methods for the classification problem are formulated, does not have sense in regression [24]. As for performing the IS task in the MTR problem, the complexity for selecting the instances is higher than the one we could have in the single-target regression problem, mainly due to the aforementioned challenges that MTR problem presents.
In the last two decades, existing ensemble-based methods have demonstrated to be really effective techniques to improve the results in complex problems [25, 26, 27, 28, 29, 30, 31, 32]. Kocev et al. [33] and Spyromitros-Xioufis et al. [6], for example, demonstrated how a better predictive performance could be obtained in solving the MTR problem by using ensemble-based approaches. On the other hand, several authors have demonstrated that the IS task can be significantly improved by means of applying ensemble-based methods [34]. By this way, the relevance of the training instances is measured by considering not only a single criterion but many approximations and, therefore, a more reliable estimation of the relevance of the instances is obtained.
In this work, an ensemble-based method to perform the IS task in the MTR problem is proposed. First, an error accumulation-based approach is introduced, which is an adaptation of the well-known family of the Decremental Reduction Optimization Procedures (henceforth DROP) [35] to multi-target data. Second, an ensemble-based method that effectively combines the partial data subsets that are previously selected by each member of the ensemble is proposed. To obtain the final data subset, an aggregation process is carried out by a simple greedy heuristic that solves an optimization problem. The members of the ensemble select the partial data subsets on different input spaces which are expanded by target variables, exploiting therefore the underlying inter-target dependencies. On the other hand, the method proposed does not use any threshold value to decide whether an instance is selected or not, resulting in a method less dependent on the specific features of each problem. To the best of our knowledge, this is the first attempt to study the selection of instances in MTR, and the main motivation of this work is to analyze the benefits of the IS task for constructing better MTR models.
The effectiveness of the proposal is assessed through an extensive experimental study, where 18 datasets of varied features and different application domains are used. The results showed that the proposed IS method can significantly boost the predictive performance of the multi-target regressors, and therefore, it can benefit the development of methods for solving complex problems that comprise the prediction of multiple outputs. A good trade-off between the predictive performance and reduction rate is attained; the size of the training sets is reduced without significantly deteriorating the predictive performances of the multi-target regressors. In addition, the proposed ensemble-based method demonstrates to be robust on datasets which have noise samples.
The remainder of this paper is arranged as follows. Section 2 briefly describes the IS task and exposes the related works that have been proposed to perform the selection of instances in the regression problem. Section 3 presents the proposed ensemble-based method. Section 4 shows a description and discussion of the experimental results. Finally, some concluding remarks are presented in Section 5.
Related work
Roughly speaking, IS methods aim to reduce the size of an original training data but retaining or improving the predictive capacity of the models. The optimal outcome of an IS method is a minimum data subset from which a learning algorithm would accomplish the same task with no performance loss as if the original dataset was used [36]. However, some authors have noted that in practice, it is not always possible to maintain the performance levels as the dataset is reduced, and a loss of effectiveness may be inevitable [37]. IS methods have the following goals [36]: (I) decrease the computational cost for predicting new patterns; (II) reduce the storage requirements by removing redundant information from datasets; (III) improve the performance of learning algorithms by removing noise and outliers; and (IV) increase the efficiency when working on large-scale datasets.
Many IS methods have been proposed in the literature, and a complete description of these methods can be consulted in [36]. The IS methods can be categorized by considering the following three criteria: (I) the selection criterion used to select the instances; (II) the type of points that are removed in the IS process; (III) and finally they can be classified according to the search direction used to obtain the final data subset.
The first category includes the wrapper [35] and filter methods [38], and the main difference between these two type of approaches lies in which the wrapper methods select the relevant instances based on the prediction made by a learning algorithm, whilst the filter methods are not based on a classifier to determine the instances to be discarded from the training set.
Regarding the second criterion, the IS algorithms can be classified into condensation [38], edition [39] or hybrid methods [35]. The condensation methods retain the points closer to the decision boundaries (border points), preserving the training error, but at the expense of deteriorating the generalization test error. The edition methods remove the border points and maintain the internal points, getting smoother decision boundaries and reducing the generalization test error. The hybrid methods, on the other hand, remove the internal and border points, taking the advantages of both the condensation and edition methods.
As for the third criterion, the IS algorithms can be classified into incremental [40], decremental [35], batch [41] or mixed methods [42]. The incremental methods start with an empty data subset and continue adding instances to it; in this case, the presentation order of the instances is an important issue that might affect the effectiveness of the IS algorithms. The decremental methods begin with the whole dataset and continues removing instances of it; in this case, the presentation order of the instances is still an important issue, but not so significant as in the case of the incremental methods. As for batch methods, they analyze all the instances but without removing them, and at the end of the process, all the instances marked as disposable are removed; the complexity of this type of methods is usually higher than the one of incremental and decremental methods. Finally, the mixed methods begin with a pre-selected data subset, and the instances which satisfy a specific criterion can be added or removed; the pre-selected data subset may be constructed by either a random selection, an incremental method, or a decremental one.
Olvera-López et al. [21], García et al. [36], and many other authors have noted that the IS task for the classification problem is widely studied. However, the selection of instances in the regression problem has not followed the same path, existing far less research in this regard [24]. Some works have proposed different evolutionary algorithms to perform the IS task in the regression problem. For example, Tolvi [43] presented a genetic algorithm that was able to detect the outliers in linear regression models, and Antonelli et al. [44] addressed the IS task through a multi-objective evolutionary learning approach. Also, there have been other efforts focused on studying the IS task in time series [45]. On the other hand, in the last few years, several works have been focused on the adaptation of some IS methods, that were originally designed for the classification problem, to the regression problem. For example, Kordos and Blachnick [46] adapted the Condensed Nearest Neighbor [40] and Edited Nearest Neighbor [39] methods, and Arnaiz-González et al. [47] proposed an adaptation of DROP method [35]. Finally, another approach for performing the IS task in the regression problem comprises the discretization of the target variable [48], and therefore, in this case any existing IS method can be used directly.
Independently, in order to improve the efficiency and accuracy of a method to find a solution for a given learning problem, ensembles of methods have gained an increasing popularity in the research community in the last few decades [49, 50]. Summarizing the advantages of the ensemble-based methods [50, 51]: (I) ensemble methods perform well in both scenarios, when there are very scarce data samples for learning and when a huge amount of data is available; (II) a combined classifier can have a better predictive performance that the best individual classifier; (III) combining methods trained from different samples could overcome the local optima problem; and (IV) an exact function may be impossible to be modelled by any single hypothesis, but the combination of several hypotheses may expand the space of representable functions. Taking the advantages provided by the ensemble learning paradigm, it is not surprising the use of ensemble-based methods to perform the IS task [22, 34]. The main objective of such ensemble-based IS methods is to produce more reliable estimations of the relevance of the instances by aggregating the outputs produced by the members of the ensemble. In this regard, there are very few works that have studied the selection of instances in the regression problem following an ensemble-based approach. The most relevant works on this topic are the ones presented by Blachnik and Kordos [52] and Arnaiz-González et al. [24], who showed that a better IS process in the regression problem can be achieved by using bagging models.
Finally, it is important to note that all the aforementioned works have been designed for selecting instances on regression problems that have only one target variable, and they are not directly applicable to the MTR problem. As far as we know, an IS method for the MTR problem has not been proposed yet. In addition to the difficulties that appear when the IS process is performed on any regression problem (these were previously mentioned in the introduction of this work), the major challenges of MTR arise from modelling the inter-target correlations and complex input-output relationships. In the next section, an ensemble-based method for the selection of instances in the MTR problem is presented.
An ensemble-based method for the selection of instances
In this section, first, an error accumulation-based approach, which is an adaptation of the well-known DROP method to multi-target data, is introduced, and then, the ensemble-based method to perform the IS task in the MTR problem is presented.
A DROP-based extension for the MTR problem
Let us say
DROP is a well-known IS method that, according to the three categories portrayed in Section 2, can be classified as a wrapper, hybrid and decremental method. In this work, the DROP-based adaptation to the regression problem presented by Arnaiz-González et al. [47] is extended to multi-target data.
Let us say
Different reliability scores have been proposed for single-target regression, such as the estimation based on sensitivity analysis [53], local cross-validation [53], analysis of the density of the distribution of instances [55], the variance of bagged models [56], and the estimation of the instances’ error by considering its local environment in the training set [57]. Recently, Levatić et al. [54] defined various reliability functions for the MTR problem following a semi-supervised approach. However, these last-mentioned functions are not directly applicable to our problem since they do not consider the true target vector of the instances.
In this work, given a training instance
where
We believe that aRRMSE is a reliable estimator since it considers the actual errors made by the internal regressor. It also imposes almost no additional computational overhead, as opposed to some other estimation methods for regression. We denoted as Ewith
The proposed method is able to detect the errors and outliers in data. For example, if the set of associate instances of the instance
The main advantages of the proposed approach are that it can be wrapped around any existing MTR regressor (problem transformation or algorithm adaptation methods), and it also does not depend on any threshold value to decide whether an instance is selected. Therefore, the IS process can be significantly benefited from the capacities of the internal regressor. The proposed IS method can implicitly exploit the inter-target dependencies for the selection of more relevant instances if the internal MTR is able to model such correlations. In this sense, the challenge of modelling complex input-output relationships can be also effectively tackled since our proposal can be applied with any linear and non-linear regression MTR algorithm.
Regarding the runtime complexity of our proposal, let us say
We propose an ensemble-based method for tackling the IS task in the MTR problem since more reliable estimations on the importance of the instances could be obtained if multiple approximations were considered. Similar to Spyromitros-Xioufis et al. [6], we adopted an approach that composes the ensemble by means of adding target variables to the input space of the MTR problem.
The rationale of our proposal is as follows. Given a multi-target dataset
Schema of the proposed ensemble-based method.
Figure 1 shows the general schema of the proposed ensemble-based method for performing the IS task in the MTR problem. It is noteworthy that the diversity of the members of the ensemble is tackled by means of executing each member on different datasets. Also, note that the members must be IS methods able to work directly with multi-target data, as DROPMTR method. On the other hand, in order to add more diversity to the ensemble, for each new dataset over which the members are performed, the presentation order of the instances is randomly changed. This action also allows that the ensemble method will be less sensitive to the presentation order of the instances, that is a limitation of any DROP-based method.
Another important issue to analyse in our approach is how to aggregate the
Let us say
where
The set of instances that are exactly selected by
where
Solving this optimization problem with classical methods could take a considerable runtime since the objective function requires the evaluation of a multi-target regressor
Algorithm 2 shows the steps of the proposed ensem ble-based IS method (hereafter, dubbed as EDROMTR). EDROMTR comprises two phases: (I) the ensemble’s members select the partial data subsets; and (II) the
On the other hand, the second phase of the ensemble-based method comprises the execution of the proposed greedy heuristic, which in turn needs to train and test (at most
In this section, the experimental study is described. First, a description of the datasets and other experimental settings used in the experiments are presented. Second, DROPMTR and EDROPMTR are performed on all the datasets, with the aim of analysing whether the proposed IS methods improve or maintain the predictive performance of the MTR regressors, and to demonstrate that the best performance is attained by the proposed ensemble-based IS method.
Multi-target datasets
In this experimental study, the largest collection of MTR datasets publicly available was used [6]. All the 18 datasets within this collection have a variety of features and belong to several application domains. Some of these datasets represent well-known engineering problems, for example: the dataset Electrical Discharge Machining (Edm) [59] represents a two-target regression problem, where the task is to minimize the machining time by reproducing the behaviour of a human operator that controls two variables; the dataset Energy Building (Enb) [8] concerns the prediction of the heating and cooling loads requirements of buildings as a function of eight parameters; the dataset Concrete Slump (Slump) [60] comprises the prediction of three properties of concrete as a function of the content of seven concrete ingredients.
On the other hand, the Andromeda (Andro) [61] and Water Quality (Wq) [62] datasets concern the prediction of water quality parameters, whereas the Jura dataset [7] focus on the prediction of the concentration of metals. The Solar Flare datasets [63] (Sf1 and Sf2) are about the prediction of the number of solar flares are observed within one day. The River Flow datasets (Rf1 and Rf2) [6] concern the prediction of river network flows. Finally, we have the following datasets associated with the business domain: Online Product Sales (Osales) [64], See Click Predict Fix (Scpf) [65], Airline Ticket Price (Atp1d and Atp7d) [6], Supply Chain Management (Scm1d and Scm20d) [6] and Occupational Employment Survey (Oes10 and Oes97) [6].
Table 1 shows a summary of the characteristics of the datasets. The datasets vary in size: from 49 up to 9,803 examples, from 7 up to 576 input variables, and from 2 up to 16 target variables. All the datasets have numeric input variables, except for Sf1 and Sf2 whose input variables are discrete. The datasets Scpf, Osales, Rf1, Rf2, Atp1d and Atp1d have missing values that were replaced by the median values of the corresponding input variables. Finally, all the numeric variables were centred and scaled.
Summary of the benchmark datasets
Summary of the benchmark datasets
Pugelj and Dzeroski [66] presented a simple adaptation of the classic
The parameter
On the other hand, Spyromitros-Xioufis et al. [6] showed that the method Ensemble of Regressor Chains (ERC) is one of the most significant state-of-the-art MTR methods. Hence, the effectiveness of the proposed IS methods was assessed by means of evaluating ERC on the selected data subsets. ERC is a problem transformation method, and therefore, it internally requires a single-target regressor. Three single-target regressors were used, namely RepTree, Linear Regression and the classic
To estimate the predictive performance of the MTR models, the measure aRRMSE (previously described in Section 3) was analysed on the test sets. In all datasets, a 10-fold cross-validation was performed, and the aRRMSE values were averaged across all fold executions. In each fold execution, the following steps were conducted: (I) the IS method reduces the training set; (II) the multi-target regressor is trained on the selected data subset; and (III) the learned model is assessed on the test set. On the other hand, the effectiveness of the IS methods was also studied by means of analysing the reduction levels of the size of the training sets.
Average reduction levels attained by DROPMTR and EDROPMTR
Average reduction levels attained by DROPMTR and EDROPMTR
Finally, non-parametric statistical tests were conducted to analyse and validate the obtained results, as proposed by Demsar [68]. All computational methods were implemented in the Java language and integrated into MULAN library [69]. MULAN is constructed over the popular framework WEKA [70] and is designed for researching in multi-label learning and MTR.
This experiment aims to analyse whether the two proposed IS methods (DROPMTR and EDROPMTR) can significantly reduce the size of the datasets. The attained reduction rate on a dataset is calculated as
Results of the aRRMSE measure for ERC-REPTree. The Friedman’s statistic is equal to 8.333, and the null hypothesis was rejected with a
-value
0.015 at the significance level
0.05
Results of the aRRMSE measure for ERC-REPTree. The Friedman’s statistic is equal to 8.333, and the null hypothesis was rejected with a
It is observed that DROPMTR attained reduction levels from 0.306 till 0.731, whereas EDROPMTR obtained reduction levels from 0.179 till 0.844. DROPM-TR method produced a big reduction (73%) in the dataset Sf2, whereas EDROPMTR method achieved a significant reduction (84%) in the dataset Sf1. In average, the experimental results showed that the DROPMTR can reduce the size of the datasets more than EDROPMTR. This behaviour was expected because EDROPMTR intends to determine the best subset of instances that produces the lowest prediction error on a test set that contains all the instances selected by the
All pairwise comparisons conducted by Bergmann-Hommel’s test. In the diagrams, the groups of methods that are not significantly different are connected by a line.
This experiment focusses on determining whether the application of the proposed IS methods implies a significant improvement or deterioration in the overall predictive performance of the regressors ERC-REPTree, ERC-LR and ERC-kNN. These three multi-target regressors were trained on the original training sets, and on the subsets selected by the IS methods.
Tables 3–5 show the results of the aRRMSE measure. In each row, the best error value is highlighted in bold typeface. The column named “Original” represents the predictive performance obtained on the original datasets, whereas the columns named “Subset
Results of the aRRMSE measure for ERC-LR. The Friedman’s statistic is equal to 7.861, and the null hypothesis was rejected with a
-value
0.020 at the significance level
0.05
Results of the aRRMSE measure for ERC-LR. The Friedman’s statistic is equal to 7.861, and the null hypothesis was rejected with a
It was observed that the predictive performance of the three MTR regressors is improved in many cases. Also, it is relevant to note that the predictive performance was improved even on those datasets for which the IS methods attained high reduction levels (e.g. Sf1, Sf2 and Wq), so showing that the proposed IS methods can select subset of relevant instances, and also that these particular datasets have a considerable number of irrelevant and/or redundant instances. The average rankings computed by Friedman’s test shows that, in average, the best results were reported when the MTR regressors were executed on the data subsets selected by EDROPMTR, indicating the effectiveness of the proposed ensemble-based approach. Furthermore, Friedman’s test rejected all the null hypotheses, indicating that significant differences exist in the predictive performance of the MTR regressors.
Results of the aRRMSE measure for ERC-
The Bergmann and Hommel’s test [73] was conducted in order to perform all pairwise comparisons and detect particular significant differences. Figure 2 shows the results of this statistical test, highlighting two important results: (I) the predictive performance of the MTR regressors that were trained on the data subsets selected by DROPMTR are not significantly different to the performance attained when they were trained on the original training sets, so indicating that DROPMTR can reduce considerably the size of the datasets without deteriorating the performance of the regressors; and (II) the predictive performance of those regressors that were trained on the data subsets selected by EDROPMTR is significantly better than the performance attained on the original training sets and on the data subsets selected by DROPMTR, so showing the potential of the proposed ensemble-based method.
In general, data gathered in real-world problems include noise and, therefore, the predictive performance of learning algorithms can be significantly deteriorated [36]. In this regard, the IS methods also have the intention of eliminating the noise and outliers in data.
Once the superiority of the proposed ensemble-based method has been demonstrated, in this section we analysed its capacity to eliminate noise from data. It was analysed whether EDROPMTR maintains or even increases the predictive performance levels of the regressors on datasets which have different noise levels. Similar to the method proposed by Arnaiz-González et al. in [47], we added noise to the original datasets by exchanging target vectors of randomly selected instances; the random selection was made without replacement. By this way, the sample distributions in the input and output spaces of the training sets are not modified. Three different noise levels (10%, 20% and 30%) were introduced in the training data; target vectors are swapped until these percentages of the total of instances in the training data are modified. A 10-fold cross-validation process was executed five times with different seeds, and finally the results were averaged.
Average reduction rates attained at the different noise levels
Average reduction rates attained at the different noise levels
The predictive performance of ERC-REPTree at the different noise levels
The predictive performance of ERC-LR at the different noise levels
Table 6 shows the average reduction rates at the different noise levels. We can see that better reduction rates were obtained as the noise levels increased. Thus, EDROPMTR is able to detect and remove noise from data. On the other hand, Tables 7–9 show the predictive performance of the three regressors considered in the experimental study. In each row, the best aRRMSE value attained at each noise level is highlighted in bold type face. The columns with the label “Noise” represent the predictive performance of the regressors on the datasets with noise, whereas the columns with the label “Red” represent the predictive performance attained on the data subsets selected by EDROPMTR.
It was observed that, in many cases, the predictive performance of the multi-target regressors were improved once the training sets are pre-processed with the IS method. The Wilcoxon’s test was conducted to
The predictive performance of ERC-LR at the different noise levels
detect whether there were significant differences in the predictive performance attained at each noise level. The
Finally, it is noteworthy that, on average, a worse predictive performance was obtained as the noise level increased. This is an expected result because EDROPMTR selects smaller training sets as the number of noisy instances in data increased.
Two IS methods were proposed: the first one (DRO-PMTR) is a DROP-base extension that removes internal and border points that do not contribute to a better prediction of the target vectors of their neighbours, whereas the second one is an ensemble-based IS method (EDROPMTR) that aggregates multiple predictions to select a final data subset of relevant instances. Any of the two methods do not require the use of threshold values for determining whether an instance is included in the selected data subset or not. This is a major advantage because threshold values are usually problem dependent and, therefore, it is required to conduct an additional analysis to select their adequate values. On the other hand, the proposed IS methods have acceptable runtime complexities that allow their use in large-scale datasets. In the case of EDROPMTR, the members of the ensemble can be easily executed in parallel, so allowing to considerably decrease the runtime needed to select the final data subset.
Another advantage of the proposed ensemble-based method is that it can implicitly model the inter-target dependencies, so easing the selection of more relevant instances. By analysing the way the members of the ensemble are constructed, some similarities with regard to the approach proposed by Spyromitros-Xioufis et al. in [6] are observed. In such an approach, it was demonstrated that the expansion of the input space with target variables is an effective manner to exploit the inter-target dependencies. By this way, each member of the ensemble models the relationship of one target variable with the rest of the targets.
On the other hand, it is possible to consider that the aggregation process formulated in the last step of EDROPMTR is an artefact that tries to exploit the similarities between the structural and stochastic parts of the models. The structural-part of models correspond to the data subsets selected by each member of the ensemble, whereas the stochastic-parts are related to the errors associated with the searching of the data subset that attains the best estimation. According to Dembczynski et al. [74], those methods that follow an architecture similar to the one used by EDROMTR can model the existing marginal and conditional dependencies between target variables.
Finally, it is important to highlight that the two proposed IS methods follow a wrapper approach, and therefore, they can be implicitly benefited from the power of the internal regressor. Consider that, a powerful MTR regressor could tackle not only the modelling of inter-target dependencies, but also the estimation of complex non-linear input-output relationships. Consequently, it is highly likely that the selected data subsets contain those relevant instances that reflect this type of data relationships, which have shown to be of paramount importance for solving more effectively the MTR problem.
As main drawbacks, it is noteworthy that EDROP-MTR focuses more on minimizing errors than on the size of the final subset and, therefore, the final data subset is not necessarily the most consistent subset of instances. Also, EDROPMTR is not suitable for incremental learning scenarios since it would require recomputing from scratch the subset of relevant instances every time that new samples are added.
In this work, an extensive experimental study was carried out. The first experiment showed that the proposed IS methods is able to reduce significantly the size of the datasets. Excellent reduction rates were attained on datasets with a moderate number of input variables (e.g. the datasets Atp1d, Atp7d, Oes10, Oes97 and Osales), as well as on datasets with many target variables (e.g. the datasets Oes10, Oes97, Osales and Wq).
In addition to the high reduction levels that were attained in several datasets, the second experiment showed that EDROPMTR can significantly improve the predictive performance of the regressors. This result is very promising since in the past several authors have noted that not always is possible to maintain the predictive performance of the learning algorithms after applying an IS method. Also, the results showed that EDROPMTR significantly outperforms to DROPMTR, demonstrating the effectiveness of the proposed ensemble-based approach.
The third experiment demonstrated that the pro-posed ensemble-based IS method is robust on datasets which have noise. The results indicated that EDROP-MTR is able to detect the noisy instances, allowing that the regression models do not deteriorate so much their predictive performance. Consequently, EDROPMTR is well suited to be used in real-world engineering applications that require the elimination of noise before performing crucial tasks.
Conclusions
In this work, an ensemble-based method to perform the IS task in the MTR problem has been proposed. First, an error accumulation-based approach has been introduced, which is an adaptation of the well-known DROP method to multi-target data. Second, an ensemble-based method that effectively combines the partial data subsets selected by each member of the ensemble has been also presented. The major features of our approach are: (I) a wrapper approach was adopted where any MTR regressor can be used to estimate the relevance of the instances, so the IS task can be benefited from the capacities of the internal regressor to model the inter-target dependencies and complex input-output relationships; (II) the way the ensemble’s members are constructed not only guarantee the diversity between them, but also the modelling of the inter-target dependencies; (III) the proposed ensemble-based method selects the final data subset by a simple greedy heuristic process, avoiding the use of complex optimization algorithms; and (IV) no threshold values are used in order to decide whether an instance is selected or removed, so the proposed approach is less problem dependent.
The experimental study confirmed the benefits of the IS task for solving the MTR problem, which was the main motivation of the present work. A good trade-off between the reduction levels of the size of the datasets and the predictive performance of the regressors was attained. Consequently, not only the runtime needed to construct a regression model on large-scale datasets is significantly reduced, but also its predictive performance can be even improved.
Future works will study better solutions for solving the optimisation problem formulated to aggregate the partial data subsets selected by the ensemble’s members. It is noteworthy that the final data subset determined by the proposed ensemble-based method is not the minimal data subset and, therefore, this is a relevant point to be studied in future works. On the other hand, it would be interesting to consider other ways of exploiting the relationships between targets variables. In this regard, the design of other approaches to constructing the members of the ensemble is a possible idea to follow. Finally, it would also be important to study the benefits of combining the IS task with the feature selection one for constructing better MTR models.
Footnotes
In multi-label learning, the output variables (a.k.a. labels) are restricted to binary values.
The domain of the input variables can be continuous, discrete or mixed type, whereas the domain of the target variables is always continuous.
Acknowledgements
This research was supported by the Spanish Ministry of Economy and Competitiveness and the European Regional Development Fund, project TIN2017-83445-P.
