Abstract
With growing awareness of the societal impact of decision-making, fairness has become an important issue. More specifically, in many real-world situations, decision-makers can unintentionally discriminate a certain group of individuals based on either inherited or appropriated attributes, such as gender, age, race, or religion. In this paper, we introduce a post-processing technique, called fair additive weighting (FairAW) for achieving group and individual fairness in multi-criteria decision-making methods. The methodology is based on changing the score of an alternative by imposing fair criteria weights. This is achieved through minimization of differences in scores of individuals subject to fairness constraint. The proposed methodology can be successfully used in multi-criteria decision-making methods where the additive weighting is used to evaluate scores of individuals. Moreover, we tested the method both on synthetic and real-world data, and compared it to Disparate Impact Remover and FA*IR methods that are commonly used in achieving fair scoring of individuals. The obtained results showed that FairAW manages to achieve group fairness in terms of statistical parity, while also retaining individual fairness. Additionally, our approach managed to obtain the best equality in scoring between discriminated and privileged groups.
Keywords
Introduction
Multi-criteria decision making (MCDM) has the goal to determine the scores of individuals (in further text, scores) by using multiple, often conflicting criteria [20]. Due to a large number of criteria, the scoring of individuals is not a trivial task. The decision-maker needs to take into account many dimensions of the problem at hand, compare the values between multiple individuals, and correctly define criteria weights. These scores are later used in decision-making for ranking or selecting the best individual. Therefore, a wide variety of methods have been developed [53].
Algorithmic decision-making is adopted in practical applications due to its ability to provide faster and more accurate decisions [30]. Quickly after adoption in the practice, algorithmic decision-making started being used for applications that influence the welfare of an individual. For example, applications of algorithmic decision-making can be found in hiring [17, 9], university admission [27, 19], child maltreatment [43], etc. In addition, algorithmic decisions are often seen as a major reason why a certain sub-population is not being able to achieve the desired level of outcome [8]. This is mostly due to cultural and historical obstacles certain sub-populations experience. It is also observed that algorithmic decision-making tends to amplify the existing biases [29]. Because of the above-mentioned, attention is focused on understanding why algorithms make unfair decisions regarding the inherited or appropriated characteristics of an individual, and how to mitigate the unfairness in algorithmic decision-making. These efforts are in accordance with affirmative action policies [38], and consequently can act as an effort to alleviate the possibility of legal consequences [49].
There are many notions of fairness in algorithmic decision-making [4]. Most of them define groups of individuals (thus called group fairness). The algorithmic decision-making model generates unfair outcomes if members of the privileged group have systematically higher scores than those of the disadvantaged group [45]. In other words, members of the privileged group systematically obtain better outcome compared to the members of the disadvantaged group. Based on this, a vast number of studies have dealt with the problem of mitigating unfairness in decision-making algorithms. Besides group fairness, another point of view on fairness is called individual fairness. The idea of individual fairness is that similar individuals should receive a similar outcome, or that the outcome should not change a lot with a small perturbation in the parameters of the decision model (e.g. weights of criteria) [3].
In this paper, we study the problem of producing fair scores given one legally protected attribute. However, instead of improving group fairness (as most studies do, e.g. [2, 37]), we take into account individual fairness as well. For this purpose, we propose a method based on linear programming (LP). The idea of the proposed method is to change the scores with the smallest possible intensity so that the fairness constraint is satisfied.
Bearing in mind that a lot of MCDM methods perform additive weighting (AW) to obtain scores, we developed a post-processing methodology for changing criteria weights such that fairness constraint is satisfied. As stated in [7], post-processing approaches for tackling unfairness are perhaps the most flexible ones. One of the main reasons for such remark is that there is no need to know the decision model, e.g., how a decision-maker obtained the decision matrix or weights associated with criteria. What is needed is the access to the utility scores and sensitive attribute information. We achieve fair scores by imposing that the difference between the expected scores of disadvantaged and privileged groups remains at a satisfactory level. More specifically, group fairness is achieved by minimizing the difference between the newly obtained weights and the weights obtained by decision-makers’ preferences. The level of discrimination is controlled by using a hyper-parameter set by a decision-maker. Thus, the decision-maker can choose to have perfect fairness (i.e. both disadvantaged and privileged groups have the same expected score), a small amount of discrimination (i.e. the privileged group will have a slightly higher expected score), or to achieve positive discrimination (i.e. the disadvantaged group will have a slightly higher expected score). In addition, the developed methodology can provide decision-makers an explanation of what criteria have the highest impact on discrimination, as well as what individuals influence discrimination the most. It is worth mentioning that the proposed method is not aimed to automatize the final decision, but to find the most similar solution so that the fairness constraint is satisfied.
FairAW is tested on synthetic data and real-world applications. The results show satisfactory performance compared to the baseline methods used in practice, namely Disparate Impact Remover [12] and FA*IR [54]. More specifically, a higher group fairness is achieved for a small change in criteria weights. As a limitation of the proposed method, we state that the proposed approach aimed at tackling unfairness in scores. For other decision-making problems, the proposed method might be unsuitable. For example, for the problem of choice of a single individual, the notions of group fairness are irrelevant. Another limitation is the feasibility of the solution. It might occur that none of the weight setups might achieve satisfactory group fairness.
In Section 2 we review the literature. Then, we present the proposed post-processing technique for changing criteria weights in Section 3. In Section 3 we explain the data, as well as the experimental setup. We present and discuss the results in Section 4. Finally, we conclude the paper and give a plan for further work in Section 5.
Related work
In many MCDM methods, decision-makers assign scores to individuals, which are later used for decision-making [53]. Most often, the score is obtained by combining the values of individuals for making the criteria. Since decision-makers can be biased towards a specific group of individuals regarding the sensitive attribute [28], the aim is to mitigate this bias, which is nowadays a very popular topic [24].
This section covers notions of fairness. More specifically, fairness operationalizations. Further, we cover the existing literature and methods regarding imposing fairness in decision-making algorithms.
Notions of fairness
Being fair and making fair decisions has been the subject of discussions for many centuries. Many philosophical theories discuss this matter, thus there are many mathematical notions formalizing fairness [44]. When it comes to fairness in algorithmic decision-making, there are two major points of view: Group fairness [52], and Individual fairness [11].
Group fairness is directed against systematic discrimination in decision-making regarding inherited or appropriated characteristics of an individual. Groups are defined by the highest legal acts, and these are race, gender, religion, disability status, and others [30]. This characteristic is denoted as the sensitive attribute
One of the most prominent metrics used for expressing group fairness is disparate impact (
where
The perfect value of
These measures can be used prior to decision-making. More specifically, the decision-maker can calculate both
Individual fairness, on the other hand, states that the individuals with similar characteristics should have similar scores [11]. In other words, individual fairness requires people to be treated consistently. More formally, if we define distance metric
In the context of MCDM, individual fairness is expected to be satisfied at any time. Even in the case of the extreme importance of one or several criteria, the MCDM procedure will yield similar results. However, individual fairness can have a different interpretation that is adopted in this paper. The scores should be as similar as the scores obtained by using initial weights of the decision-maker.
It is worth stating that there is an inherent trade-off between the group and individual fairness. More specifically, an improvement of individual fairness decreases group fairness, and vice versa [35]. In addition, in the literature, it can be found that most studies handle only one of the fairness notions (most often, group fairness) in ranking problems [12, 51, 54].
If decision-makers are aware of unwanted bias in the data, they can try to mitigate the bias and generate a fair decision-making process [24]. Three broad approaches can be used to mitigate unfairness: Pre-processing, In-processing, and Post-processing techniques. We focus only on the methods relevant to our approach.
Data pre-processing tries to adjust data points, or to remove the disparity in the original dataset. The information about individuals has to be saved, but without being able to distinguish between the values of the sensitive attribute based on the values of individuals for criteria. One can save the information about individuals within the groups and adjust the scores between the groups. This is done in the Disparate Impact Remover (DIR) algorithm [12]. DIR performs a geometric repair of the data at hand in such a manner that the data distribution of groups is shifted toward the mean (or median) of the dataset. More formally, for vector of values for criteria
where
Another approach worth mentioning is [22]. This approach learns a set of prototypes, and tries to reconstruct an instance in the dataset using a weighted sum of the prototypes and the probability that the instance belongs to the prototype. Such examples are as similar as the existing ones, thus can be used for decision-making. However, this approach aims at individual fairness, so that the new representation of individuals is as similar to the original one (utility loss) and that the distance in the original space of individuals is as similar as the distance in the newly created space of individuals (fairness loss). We state that this approach is not comparable to the one presented in this paper, since it does not aim at improving group fairness. In addition to the papers mentioned here, we could mention the approaches that learn fair representations of attributes in a different space such as [58, 34]. These approaches use supervised approach to learn fair representation of data that contains as much information about the prediction task at hand. Thus, representation is aimed to be both fair and accurate.
In-processing techniques adjust the decision-making procedure to generate fair results. To the best of our knowledge, this is seldom done in the MCDM. The following approaches are found in the area of machine learning. Asudeh et al. [2] developed a system that helps users choose the criteria weights that lead to greater fairness. The authors considered ranking functions that compute the score for each individual as a weighted sum of attribute values, and then sort individuals based on the scores. Each ranking function can be expressed as a point in a multi-dimensional space and the authors showed the procedure on how to efficiently identify the regions where fairness criteria are satisfied. In addition, a conceptual and computational framework that allows the formulation of fairness constraints on rankings in terms of exposure allocation along with an efficient algorithm for fair ranking is presented in [40]. Lohaus et al. [25] address the problem of classification under fair constraints and claim that the trade-off between model efficiency and fairness has to be made by relaxing these constraints and finding the solution that can reach satisfactory fair results.
In-processing techniques can be of help in fair MCDM. One can find statistical parity as a constraint [52], as in Eq. (4).
where
There are other in-processing approaches, one of which is [55] where learning to rank is used to accommodate for fairness in top-
Finally, post-processing of the scores can be applied as well. These approaches adjust the scores obtained by a decision-making method. Therefore, the decision-maker can derive scores, observe if unfairness exists, and correct them to be fair. One such approach is FA*IR algorithm [54]. As the input, FA*IR algorithm takes the first
Another paper worth mentioning is [56]. It interpolates between the views what you see is what you get and we are all equal, which maps to individual fairness and anti-discrimination law (statistical parity), respectively. This is done with a hyper-parameter using the optimal transport approach. The method scales well with a large datasets where both privileged and disadvantaged groups are well represented. For a more detailed discussion on fairness in machine learning based ranking methods, we refer to [57], and for a more detailed discussion on fairness methods in machine learning we refer to [7].
The contributions of this paper are:
Incorporation of group fairness constraints that satisfies disparate impact (statistical parity), Satisfaction of the individual fairness, as we defined it, and The proposed model results in interpretable solution.
Group fairness is based on the
Since individual and group fairness imply different points of view, i.e. satisfying one can violate the other [56], we find it necessary to make a compromise. This compromise is defined in the optimization model we propose. More specifically, we propose adding an additional constraint regarding the individual fairness.
Finally, due to importance of the model interpretation, we adopt linear decision-making model. Linear models can be optimized efficiently and have the advantage of dual model analysis. To the best of our knowledge, this is seldom done in the area of fair machine learning.
We incorporate both group fairness and individual fairness requirements using the post-processing technique for changing weights of criteria and consequently changing scores. More specifically, we derive a linear mathematical model with a goal function of minimal change in criteria weights subject to a fairness constraint. The goal function generates new weights to be as close as the initial ones, thus forcing individual fairness. On the other hand, the statistical parity constraint does not allow privileged group individuals to have much higher scores. The initial criteria weights are assigned by decision-makers directly (or indirectly).
In contrast to [54] we adjust criteria weights, thus making the decision model interpretable. FA*IR performs as a closed-box algorithm where scores are inserted and rankings are produced. The decision-maker is not intuitively shown how the rankings are obtained.
The flexibility of our mathematical model allows the use of other individual or group fairness metrics, which makes it more favorable to apply. Moreover, while the models available in literature change only final ranks (such as [54], and [55]), the FairAW changes scores (through changing weights) that affect the ranks as well.
This section consists of four parts. First, we motivate the mathematical model. Then, we provide the process of the mathematical model derivation. This part of the methodology links justice theory and the proposed mathematical model by a step-by-step explanation of the goal function and individual and group fairness constraints. Then, we briefly explain the data used in this research. The mathematical model was tested on three datasets with a different number of alternatives (individuals) and criteria. Finally, we describe the experimental setup.
Motivation for the FairAW
In classical decision theory, some of the tasks (that this paper aims at) are finding the score of alternatives (individuals), and consequently ranking. These are performed by aggregating criteria weights and values of individuals. For this purpose, commonly-used MCDM methods, such as Simple Additive Weighting (SAW) can be exploited [47].
However, with the development of the theory of social justice, fairness, and equality, there is a rising need to create fair decision-making methods [31]. In order to achieve fair MCDM models, one first needs to define fairness. Commonly, MCDM methods employ the utilitarianism principle. More specifically, a morally right course of action is the one that produces the maximum benefit for the decision-maker [26]. As a consequence of such an approach, one tries to maximize utility. While some find utilitarianism a very popular framework for decision-making, it fails to take into account the considerations of fairness. Although decisions produce greater benefits for the system, they can be very unfair towards a specific subgroup of people. Today, the examples of cultural and historical biases exist in science, technology, engineering, and mathematics education. Male students are dominant in enrolling for software engineering education [14]. Another example is COMPAS [10]. By observing the decision-making criteria, African Americans are obtaining higher scores (and thus sentenced). However, this is arguably due to the bias that “young and black” are criminals [23]. Further, algorithms are known to amplify the existing biases [29], and although unwanted discrimination cannot be eliminated, it can be reduced to a satisfactory level. Utilitarianism does not take into account historical and existing biases, and if one wants to act with affirmative actions and mitigate the possibility that the decision-making process has a disparate impact, utilitarianism should not be the sole principle.
In order to introduce fairness equality of outcomes or equality of opportunity have to be taken into account. Equality of outcomes assumes that all individuals are equal (we are all equal), and consequently should have the same outcome. This means that personal characteristics of an individual, either inherited or appropriated, should be independent of the outcome, both directly and indirectly. This point of view on equality corresponds to the egalitarian point of view on fairness. Equal opportunity, in the original definition, requires that individuals with the same set of skills (or scores in the decision-making terminology) have the same opportunities regardless of their inherited or appropriated characteristics. In terms of algorithmic decision-making percentage of those getting the desired outcome should be approximately the same. This point of view on equality corresponds to the Rawlsian point of view on fairness [5].
However, it is not reasonable to expect that scores are the same for both genders (or for every race, religion, etc.). Some small discrimination is expected and allowed. Therefore, the common fairness threshold is “80% rule” which states that allowable disparate impact is such that disadvantaged group should achieve at least 80% of the privileged group’s expected score in the same population [12].
FairAW
Our idea is to incorporate both utilitarianism and fairness into the FairAW method. Utilitarianism aims to satisfy that the scores are as the one given by the decision-maker, while fairness ensures that unwanted discrimination does not occur.
Weights present preferences of decision-makers regarding criteria. Therefore, weights show decision-makers’ expert opinion on how influential criteria is in decision-making. If the initial weights given by the decision-maker result in disparate impact, then this decision-making process can be subject to legal consequences [4]. To mitigate this issue, the weights can be changed so that legal constraints are satisfied. If weights are changed, the resulting scores should not differ much from the original. This is aligned with the notion of individual fairness [11]. More specifically, the model should be constrained to obtain similar scores as the original one, or at most
Decision matrix
We chose simple additive weighting (SAW) since it is widely used [47] and the fact that many MCDM methods use additive weighting as an integral part of utility calculation. Thus, the proposed method can be used in different MCDM methods (such as AHP, VIKOR, etc.) with minor (or major) adjustments. Although the application of the proposed method is not limited to SAW only, and other MCDM methods can be used as well, this paper focuses solely on SAW. The basic concept of the SAW method is to find the score for each alternative using the weighted sum. A decision matrix is given in Table 1: where
where
Normalaized decision matrix
Having the data prepared, the score
where
The intuition is as follows. We would like to create a decision model that is as similar to the original one as possible, thus satisfying the initial decision-maker’s judgement. This is achieved by imposing similar weights (
The second constraint introduces group fairness (Eq. (3.2)). We employ statistical parity, which is presented in the Eq. (2). More specifically, the difference of the expected scores between the disadvantaged (
Finally, a constraint was added (the third one) where the absolute difference between the new score and the original score is lower or equal to
The initial model was further modified due to the presence of absolute values in the goal function and constraints. More specifically, the mitigation of absolute values from the mathematical model requires a transformation and is presented in Eq. (3.2).
where
We are aware that other mathematical formulations for the goal function can be used. For example, instead of an absolute difference between new and original criteria weights, quadratic difference or any other difference function can be used. In Appendix A, we propose an additional mathematical model that solves a similar problem (both group and individual fair decision-making model) using quadratic optimization.
Linear formulation of the problem at hand allows us to get a better insight into the sources of unfairness. More specifically, we can answer the questions such as: Which criteria are unfair? Can we increase (or decrease) a weight? What individuals are bounded by fairness constraints (either individual or group fairness constraints)? These questions are answered by solving a dual mathematical model. It is worth stating that the duality gap in linear programming is certainly zero, thus the answers obtained by solving dual mathematical model are certainly right in their interpretation. The dual mathematical model is presented in Appendix B.
To summarize, the proposed method applies linear programming to adjust criteria weights such that:
statistical parity is satisfied (and consequently disparate impact), thus making the model satisfy group fairness, and the difference between initial scores and the new scores is small (constrained with parameter
Due to constrained optimization, there is a guarantee that the obtained solution satisfies both notions of fairness if a feasible solution exists. In addition, the proposed method can be used to inspect what criteria, as well as what individuals are creating unfairness in scores.
We argue that the proposed method can be used in the situations where fairness is a hard constraint, or when the decision-maker wants to alleviate legal consequences. Even in applications where fairness is not a necessity, decision-makers can use the proposed method to create positive discrimination, e.g. affirmative actions. However, newly obtained criteria weights and scores should be returned to the decision-maker before making the final decision. If the decision-maker is satisfied with the proposed solution, the decision can be made. If not, the decision-maker can adjust criteria weights to match his/her preferences.
The proposed methodology utilizes
The datasets used in this research correspond to real-life decision-making scenarios, where individuals are described using a sensitive attribute and qualities on a numerical scale (criteria). We must remark that the choice of the privileged group and the disadvantaged group is determined by legal documents, such as the constitution, laws, or voluntary commitment. To evaluate the proposed FairAW method, we test different scenarios (different choices of a sensitive attribute in different datasets), but in a real application there is no ambiguity about what the disadvantaged group is and what the allowable disparate impact (statistical parity) is.
We used four datasets. Two datasets are synthetically generated, and the remaining two are publicly available real-world examples. We synthetically generated a dataset regarding student college placements, while student performance data [21] presents synthetically generated data regarding scores on the test. The publicly available datasets are COMPAS [1] and LSAT [56].
Both COMPAS and LSAT are commonly used to test for unfairness in supervised algorithmic decision making and ranking. It is worth stating that none of the datasets have any missing values or visible outliers.
Experimental setup
For each of the previously described datasets, we tested the proposed FairAW method and compared the results with baseline methods. The research question we wanted to solve is: Can we reach a satisfactory fair solution? A satisfactory fair solution requires from our model to reach a solution that is both group and individually fair given the data at hand. Having this in mind, our hypothesis is that the proposed approach finds a solution that is both group and individually fair. Since the proposed method solves the optimization problem, the proposed approach would yield a solution that satisfies both notions of fairness if such a solution exists. The results are compared with:
Fairness unaware scores represent a situation where a decision-maker acts ignorant about the potential unfairness or disparities in the decision-making. Therefore, this approach acts as a utilitarian baseline. If there are disparities in the underlying data, those disparities are reflected in the scores that are obtained. Another assumption that is common in the algorithmic decision-making community is that fair data should lead to fair decisions [12]. Effort can be invested in “cleaning” the data from disparities. The approach to remove the disparities in data is DIR. We use DIR as a benchmark for that reason. Finally, we use FA*IR as an approach that belongs to the same family of unfairness mitigation approaches. More specifically, once the scores are obtained, they (or in the case of FA*IR ranks) can be adjusted to obtain fair results. Due to the nature of the DIR algorithm, we assume that DIR approach would generate group fair results. However, it is highly probable that individual fairness would be violated. This is expected since DIR is designed to correct the input data to be group fair, not to generate individually fair results. On the other hand, as FA*IR is a statistical approach that does not alter scores, one can obtain group fair results, but without a proper explanation. This is due to changing the ranking of alternatives while all alternatives are having the same score as prior to the procedure.
where
Further, we report a maximal and minimal change in scores, presented in the Eq. (3.4):
These measures indicate the level of individual fairness, as the definition of the individual fairness adopted in this paper is the deviation from the original scores. Besides changes in scores, we report changes in rankings. We interpret ranking measures with caution because the proposed methodology is not aimed at ranking directly. They are presented as FA*IR which is aimed to improve fairness in ranking, and we use it to complement the previously described individual fairness metrics. Since we would like to see similar ranking scores for both privileged and disadvantaged groups, we report an average rank change for both privileged (
However, average rank change for both the privileged and disadvantaged group changes due to intervention in the optimization. Therefore, we calculate the difference in average ranks (where
Since the goal of the paper is to improve fairness in scores, we calculate disparate impact (
The second dataset Student performance had almost no discrimination in the data. Therefore, we wanted to achieve perfect disparate impact. For FairAW we set
Finally, for the COMPAS and LSAT datasets we wanted to achieve disparate impact of 0.9. To achieve this, we set for FairAW
For each dataset, we used the same strategy for selecting the weight vector
The performance of the proposed FairAW method and other methods used for comparison is divided by datasets. Therefore, we first present the performance on the Student placement dataset, followed by Student Performance, COMPAS and LSAT datasets.
Student Placement Results (
) – utility change
Student Placement Results (
Student Placement Results (
The dataset initially had a high level of group unfairness with
The FA*IR algorithm did not change the scores, but it ranked the individuals differently. Disparate impact and statistical parity did increase to an approximately satisfactory solution. However, the interpretability of the model is very low. The decision-maker should expect envy from the privileged group individuals, i.e. individuals with a lower score are ranked better than the individuals with a higher score.
Our approach satisfied the intuition we set. More specifically, we wanted to satisfy both individual and group fairness. Individual fairness measures show that the average overall difference in scores is just 0.037. Further, the privileged group individuals have their scores reduced by 0.011 on average, while disadvantaged group individuals have their scores increased by 0.063 on average. However, it can be observed that individual fairness is an active constraint since the maximum increase achieved with the new weights is equal to the level of tau. In addition, group fairness is satisfied as well. By inspecting
We can observe that there are differences in ranking as well. Although our approach FairAW has the best
Our approach has an additional advantage. Since weights are changed in such manner as to satisfy both individual and group fairness, we can observe which criteria were unfair and how the change in weights affected individual and group fairness. The latter can be observed by inspecting dual variables as presented in Appendix B. For this dataset, we observe that six criteria have disparities in values (which is exactly the number we implemented during the creation of the dataset). The weights from five of six criteria were transferred to the remaining three (that have disparity toward female individuals). However, none of the criteria had a weight equal to zero. The most discriminative criterion was the one that was intentionally created to have the most gender disparities – the number of points obtained at the admission test – and its weight is 0.011. It is worth mentioning that FairAW cannot achieve perfect fairness (i.e.
Student Performance Results (
Student Performance Results (
Although this dataset is relatively simple (only three criteria), our approach performed the best regarding the difference in scores and other fairness metrics as well. Our approach achieved perfect fairness as it was given as a constraint of the model. As a consequence, the ranking improved as well. The average difference in ranks between the disadvantaged and privileged group is just 4.90 (from an initial 79.73).
There is a reason why such results are obtained. During the creation of the dataset, disparate impact could not be observed directly because both means and median values for each criterion were similar, but the joint data distribution was different. Because of this, DIR did not correct the data significantly. As a result of the data correction, a slightly greater discrepancy between the privileged and disadvantaged groups was created. On the other hand, algorithm FA*IR used a statistical approach in selecting individuals from privileged and disadvantaged groups. More specifically, it selected either the best overall individual or an individual from the disadvantaged group if the expected number of individuals from the disadvantaged group had to be selected. Since the scores were fair, it adjusted the ranking procedure (thus
The proposed FairAW method achieved perfect group fairness with
COMPAS results (
COMPAS results (
On the real-world dataset, our method outperformed both DIR and FA*IR in terms of fairness. Individual and group fairness constraints did not allow a higher difference in scores, both between the original scores and new scores, and between the scores of privileged and disadvantaged groups. If the changes in scores are, it can be noticed that the proposed approach stayed within the given limits (
LSAT results (
LSAT results (
After performing the fairness mitigation strategies, it can be observed that DIR obtained the best fairness in terms of rank differences, as well as fairness metrics
The usage of pre-processing techniques for decision-making is believed to have the greatest potential to mitigate unfairness in algorithmic decision-making [16]. However, by changing the data, the originality and accuracy of the data are also lost. Additionally, fair data can lead to unfair decisions [24]. For example, a specific combination of the criteria is unfair, and the decision-maker, accidentally or not, employs such practice.
Our approach, besides giving satisfactory results in terms of individual and group fairness, is interpretable. Decision-makers can see which criterion was unfair, and reduce the influence of that criterion in the decision-making process. They can even use the results of our method to examine the decision-making process, from data collection to weight estimation, and try to identify where the bias came from [15]. An additional benefit of our approach is in its mathematical formulation. Using linear programming allows us to inspect dual variables and see which alternatives contributed to unfairness. However, our approach does have limitations. First, it works on the entire dataset. Since weights are changed in the optimization procedure, each alternative is re-evaluated, thus getting a new score. Second, our procedure can yield an infeasible solution. If it is not possible to achieve the required level of group fairness with the parameter
We highlight that the role of decision-maker remains the most important in the decision-making process. The decision-maker is responsible for designing a decision-making process by selecting the most important criteria, finding alternatives that can solve the problem at hand, as well as deriving utility functions (values in the decision matrix), and criteria weights. The role of domain knowledge in the proposed approach is of immense importance. Domain knowledge can help in guiding the process by proposing additional constraints in the optimization. For example, adding the upper and lower bounds on the criteria weights (either for weights in general or for a specific weight). A recommended way to use the proposed method is by utilizing the human-in-the-loop approach (as in [48]). The results of the proposed approach return: 1) adjusted weights that the decision-maker can interpret in terms of what criteria are becoming more/less important, or what criteria discriminate in terms of the sensitive attribute. Once the decision-maker becomes aware of the possible discrimination, he/she can adjust the decision-making process by eliminating the criteria that pose a problem, and 2) dual variables that help the decision-maker identify what alternatives are limiting the group or individual fairness. By removing this alternative from the decision matrix (with a proper non-discriminate explanation of why it is removed), the decision-making process could yield a fair solution.
This paper presents a FairAW method used to provide fair additive weighting by considering both group and individual fairness. We propose a linear mathematical model that aims at changing criteria weights such that statistical parity constraint and individual fairness constraints are satisfied. The goal of the proposed model is to change criteria weights by the smallest possible value (thus changing scores by the smallest possible value) so that both group and individual fairness are satisfied. Group fairness is presented with statistical parity, and it is controlled with a hyper-parameter. Similarly, individual fairness, which is defined as a difference between original scores and the new scores, is controlled with a hyper-parameter.
The results showed that the proposed approach performs well regardless of the shape of data, i.e. of how many criteria and individuals there are. The results were compared with the Disparate Impact Remover algorithm that corrects the data at hand to be fair, and FA*IR algorithm. Disparate Impact Remover managed to improve fairness on a group level, but at the cost of individual fairness. Compared to algorithm FA*IR, the proposed method performed better in terms of the required level of group fairness, but with a loss in individual fairness. This is due to the statistical nature of FA*IR and the constraint about the number of individuals for which the expected minimum number of individuals from the disadvantaged group should be in the top
As a part of future work, we identify the problem of multiple sensitive groups (i.e. intersectional fairness). If groups were vaguer and the direction and intensity of discrimination not known, then we would need to make our mathematical model more complex by adding multiple constraints, i.e. the comparison of each group and restricting from both lower and upper sides. However, one of the main challenges is to alter the decision-making algorithm so that the group fairness constraint is satisfied.
Another line of future work is aimed at implementing the proposed FairAW into other MCDM models, namely VIKOR [33] and AHP [39]. For VIKOR, we plan to define an interval guarantee solution that will present a pessimistic estimate of the score or rank of the individual. The similar mathematical model can be used for this task as well. However, enforcing fairness in VIKOR method is harder as compared to SAW. To estimate the interval guarantee solution of an individual, one needs to sort values, calculate acceptable advantage and acceptable strong solution, which is non-linear and even non-deferential [33]. In the case of AHP, enforcing FairAW method is slightly easier. Fairness could be enforced into pairwise comparison matrix. Individual fairness can be defined in the same manner as in FairAW, while group fairness can be presented as statistical parity. However, an additional level of complexity should be considered. It would be beneficial for the decision-maker to solve the possible inconsistency of the pairwise comparison matrix. While individual and group fairness can be solved using an efficient and tractable optimization tools, consistency cannot (both using maximum eigenvalue or Rayleigh coefficient are non-convex). Additionally, changing values in the pairwise comparison matrix present a discrete optimization, since the values in the pairwise matrix are limited to a predefined set of values (
Footnotes
Acknowledgments
This research was funded by Office of Naval Research grant number ONR N62909-19-1-2008, Project title: “Aggregating computational algorithms and human decision-making preferences in multi-agent settings”.
Appendix A. Quadratic model
Instead of the proposed loss function, one can use other loss functions. One such example can be a quadratic loss function. This formulation of the problem requires the usage of convex optimization methods, such as gradient descent [6]. The model is presented in Eq. (Appendix A. Quadratic model).
This formulation corresponds to
The one we presented in the paper corresponds to
Authors are aware that other loss functions can be applied, such as Huber loss or logarithm of the hyperbolic cosine [18].
Appendix B. Dual model
Linear programming model has the ability to easily define and interpret the dual model. Dual variables will enable better interpretability of the model and allow greater discussion of the decision-making model. The dual model is presented in Eq. (Appendix B. Dual model).
The dual model will have to maximize the dual variables corresponding to the weights equality constraint
There are two sets of constraints. The first one regards the transformation of the absolute value in the goal function, and thus dual variables are
