FairAW – Additive weighting without discrimination

Abstract

With growing awareness of the societal impact of decision-making, fairness has become an important issue. More specifically, in many real-world situations, decision-makers can unintentionally discriminate a certain group of individuals based on either inherited or appropriated attributes, such as gender, age, race, or religion. In this paper, we introduce a post-processing technique, called fair additive weighting (FairAW) for achieving group and individual fairness in multi-criteria decision-making methods. The methodology is based on changing the score of an alternative by imposing fair criteria weights. This is achieved through minimization of differences in scores of individuals subject to fairness constraint. The proposed methodology can be successfully used in multi-criteria decision-making methods where the additive weighting is used to evaluate scores of individuals. Moreover, we tested the method both on synthetic and real-world data, and compared it to Disparate Impact Remover and FA*IR methods that are commonly used in achieving fair scoring of individuals. The obtained results showed that FairAW manages to achieve group fairness in terms of statistical parity, while also retaining individual fairness. Additionally, our approach managed to obtain the best equality in scoring between discriminated and privileged groups.

Keywords

Additive weighting multi-criteria decision making algorithmic decision making fairness bias mitigation

1. Introduction

Multi-criteria decision making (MCDM) has the goal to determine the scores of individuals (in further text, scores) by using multiple, often conflicting criteria [20]. Due to a large number of criteria, the scoring of individuals is not a trivial task. The decision-maker needs to take into account many dimensions of the problem at hand, compare the values between multiple individuals, and correctly define criteria weights. These scores are later used in decision-making for ranking or selecting the best individual. Therefore, a wide variety of methods have been developed [53].

Algorithmic decision-making is adopted in practical applications due to its ability to provide faster and more accurate decisions [30]. Quickly after adoption in the practice, algorithmic decision-making started being used for applications that influence the welfare of an individual. For example, applications of algorithmic decision-making can be found in hiring [17, 9], university admission [27, 19], child maltreatment [43], etc. In addition, algorithmic decisions are often seen as a major reason why a certain sub-population is not being able to achieve the desired level of outcome [8]. This is mostly due to cultural and historical obstacles certain sub-populations experience. It is also observed that algorithmic decision-making tends to amplify the existing biases [29]. Because of the above-mentioned, attention is focused on understanding why algorithms make unfair decisions regarding the inherited or appropriated characteristics of an individual, and how to mitigate the unfairness in algorithmic decision-making. These efforts are in accordance with affirmative action policies [38], and consequently can act as an effort to alleviate the possibility of legal consequences [49].

There are many notions of fairness in algorithmic decision-making [4]. Most of them define groups of individuals (thus called group fairness). The algorithmic decision-making model generates unfair outcomes if members of the privileged group have systematically higher scores than those of the disadvantaged group [45]. In other words, members of the privileged group systematically obtain better outcome compared to the members of the disadvantaged group. Based on this, a vast number of studies have dealt with the problem of mitigating unfairness in decision-making algorithms. Besides group fairness, another point of view on fairness is called individual fairness. The idea of individual fairness is that similar individuals should receive a similar outcome, or that the outcome should not change a lot with a small perturbation in the parameters of the decision model (e.g. weights of criteria) [3].

In this paper, we study the problem of producing fair scores given one legally protected attribute. However, instead of improving group fairness (as most studies do, e.g. [2, 37]), we take into account individual fairness as well. For this purpose, we propose a method based on linear programming (LP). The idea of the proposed method is to change the scores with the smallest possible intensity so that the fairness constraint is satisfied.

Bearing in mind that a lot of MCDM methods perform additive weighting (AW) to obtain scores, we developed a post-processing methodology for changing criteria weights such that fairness constraint is satisfied. As stated in [7], post-processing approaches for tackling unfairness are perhaps the most flexible ones. One of the main reasons for such remark is that there is no need to know the decision model, e.g., how a decision-maker obtained the decision matrix or weights associated with criteria. What is needed is the access to the utility scores and sensitive attribute information. We achieve fair scores by imposing that the difference between the expected scores of disadvantaged and privileged groups remains at a satisfactory level. More specifically, group fairness is achieved by minimizing the difference between the newly obtained weights and the weights obtained by decision-makers’ preferences. The level of discrimination is controlled by using a hyper-parameter set by a decision-maker. Thus, the decision-maker can choose to have perfect fairness (i.e. both disadvantaged and privileged groups have the same expected score), a small amount of discrimination (i.e. the privileged group will have a slightly higher expected score), or to achieve positive discrimination (i.e. the disadvantaged group will have a slightly higher expected score). In addition, the developed methodology can provide decision-makers an explanation of what criteria have the highest impact on discrimination, as well as what individuals influence discrimination the most. It is worth mentioning that the proposed method is not aimed to automatize the final decision, but to find the most similar solution so that the fairness constraint is satisfied.

FairAW is tested on synthetic data and real-world applications. The results show satisfactory performance compared to the baseline methods used in practice, namely Disparate Impact Remover [12] and FA*IR [54]. More specifically, a higher group fairness is achieved for a small change in criteria weights. As a limitation of the proposed method, we state that the proposed approach aimed at tackling unfairness in scores. For other decision-making problems, the proposed method might be unsuitable. For example, for the problem of choice of a single individual, the notions of group fairness are irrelevant. Another limitation is the feasibility of the solution. It might occur that none of the weight setups might achieve satisfactory group fairness.

In Section 2 we review the literature. Then, we present the proposed post-processing technique for changing criteria weights in Section 3. In Section 3 we explain the data, as well as the experimental setup. We present and discuss the results in Section 4. Finally, we conclude the paper and give a plan for further work in Section 5.

2. Related work

In many MCDM methods, decision-makers assign scores to individuals, which are later used for decision-making [53]. Most often, the score is obtained by combining the values of individuals for making the criteria. Since decision-makers can be biased towards a specific group of individuals regarding the sensitive attribute [28], the aim is to mitigate this bias, which is nowadays a very popular topic [24].

This section covers notions of fairness. More specifically, fairness operationalizations. Further, we cover the existing literature and methods regarding imposing fairness in decision-making algorithms.

2.1 Notions of fairness

Being fair and making fair decisions has been the subject of discussions for many centuries. Many philosophical theories discuss this matter, thus there are many mathematical notions formalizing fairness [44]. When it comes to fairness in algorithmic decision-making, there are two major points of view: Group fairness [52], and Individual fairness [11].

Group fairness is directed against systematic discrimination in decision-making regarding inherited or appropriated characteristics of an individual. Groups are defined by the highest legal acts, and these are race, gender, religion, disability status, and others [30]. This characteristic is denoted as the sensitive attribute $s$ . If there are differences in the scores based on the $s$ , then group unfairness exists. The group with the lower expected score is called the disadvantaged group (denoted $s=1$ ), while the group with the higher expected score is called the privileged group (denoted $s=0$ ). In the presence of multiple groups, one would observe and compare the expected score to the expected score of the best performing group [50].

One of the most prominent metrics used for expressing group fairness is disparate impact ( $D I$ ). A decision-making process suffers from disparate impact if outcomes disproportionately hurt people with certain sensitive attribute values [52]. Disparate impact is calculated as in the Eq. (1):

$\displaystyle DI=\frac{\mathbb{E}(u|s=1)}{\mathbb{E}(u|s=0)}$ (1)

where $\mathbb{E}$ denotes the expected score $u$ . $D I$ should indicate whether the expected scores are independent of the sensitive attribute. One would prefer $DI=1$ , which indicates perfect fairness, while the values $DI<1$ or $DI>1$ indicate the presence of unfairness. More specifically, if $DI<1$ then the privileged group has a higher expected score. Similarly, if $DI>1$ then the disadvantaged group has a higher expected score, leading to a situation that sometimes can be interpreted as positive discrimination [32]. Another fairness measure is statistical parity ( $S P$ ). Statistical parity is defined as in the Eq. (2).

$\displaystyle SP=\mathbb{E}(u|s=1)-\mathbb{E}(u|s=0)$ (2)

The perfect value of $S P$ is zero. In other words, perfect fairness is achieved if the expected scores of both groups are equal. The values $SP<0$ or $SP>0$ indicate the presence of discrimination and are interpreted as deviation or distance of the solution from the perfect fairness.

These measures can be used prior to decision-making. More specifically, the decision-maker can calculate both $D I$ and $S P$ to observe whether the decision-making procedure generates unwanted bias. However, $D I$ and $S P$ are more often calculated after the decision-making process. In the situations where fairness is a hard constraint (e.g. hiring) the decision-maker can intervene and inspect the decision-making procedure if unfairness in the final decision is observed.

Individual fairness, on the other hand, states that the individuals with similar characteristics should have similar scores [11]. In other words, individual fairness requires people to be treated consistently. More formally, if we define distance metric $d(\cdotp,\cdotp)$ , and $i$ and $j$ present individuals that are similar using distance metric $d$ , for example $d(i,j)<\epsilon$ then the scores for individuals $i$ and $j$ should be similar, i.e. $u_{i}\sim u_{j}$ . This should hold for arbitrary distance metric $d$ .

In the context of MCDM, individual fairness is expected to be satisfied at any time. Even in the case of the extreme importance of one or several criteria, the MCDM procedure will yield similar results. However, individual fairness can have a different interpretation that is adopted in this paper. The scores should be as similar as the scores obtained by using initial weights of the decision-maker.

It is worth stating that there is an inherent trade-off between the group and individual fairness. More specifically, an improvement of individual fairness decreases group fairness, and vice versa [35]. In addition, in the literature, it can be found that most studies handle only one of the fairness notions (most often, group fairness) in ranking problems [12, 51, 54].

2.2 Approaches for mitigating unfairness

If decision-makers are aware of unwanted bias in the data, they can try to mitigate the bias and generate a fair decision-making process [24]. Three broad approaches can be used to mitigate unfairness: Pre-processing, In-processing, and Post-processing techniques. We focus only on the methods relevant to our approach.

Data pre-processing tries to adjust data points, or to remove the disparity in the original dataset. The information about individuals has to be saved, but without being able to distinguish between the values of the sensitive attribute based on the values of individuals for criteria. One can save the information about individuals within the groups and adjust the scores between the groups. This is done in the Disparate Impact Remover (DIR) algorithm [12]. DIR performs a geometric repair of the data at hand in such a manner that the data distribution of groups is shifted toward the mean (or median) of the dataset. More formally, for vector of values for criteria $C$ , a new vector $\hat{C}$ is generated such that (Eq. (3)):

$\displaystyle\hat{C}=F_{A}^{-1}(F_{s}(C))$ (3)

where $A$ presents the new distribution, $F_{A}^{-1}$ the inverse distribution function of $A$ , and $F_{s}(C)$ the distribution function of criteria vector $C$ given sensitive attribute $s$ . The goal function minimizes the sum of distances of the original values. As a result, the values in the privileged group are reduced, while the values in the disadvantaged group are increased so that the average value is the same. This procedure is repeated for each criterion.

Another approach worth mentioning is [22]. This approach learns a set of prototypes, and tries to reconstruct an instance in the dataset using a weighted sum of the prototypes and the probability that the instance belongs to the prototype. Such examples are as similar as the existing ones, thus can be used for decision-making. However, this approach aims at individual fairness, so that the new representation of individuals is as similar to the original one (utility loss) and that the distance in the original space of individuals is as similar as the distance in the newly created space of individuals (fairness loss). We state that this approach is not comparable to the one presented in this paper, since it does not aim at improving group fairness. In addition to the papers mentioned here, we could mention the approaches that learn fair representations of attributes in a different space such as [58, 34]. These approaches use supervised approach to learn fair representation of data that contains as much information about the prediction task at hand. Thus, representation is aimed to be both fair and accurate.

In-processing techniques adjust the decision-making procedure to generate fair results. To the best of our knowledge, this is seldom done in the MCDM. The following approaches are found in the area of machine learning. Asudeh et al. [2] developed a system that helps users choose the criteria weights that lead to greater fairness. The authors considered ranking functions that compute the score for each individual as a weighted sum of attribute values, and then sort individuals based on the scores. Each ranking function can be expressed as a point in a multi-dimensional space and the authors showed the procedure on how to efficiently identify the regions where fairness criteria are satisfied. In addition, a conceptual and computational framework that allows the formulation of fairness constraints on rankings in terms of exposure allocation along with an efficient algorithm for fair ranking is presented in [40]. Lohaus et al. [25] address the problem of classification under fair constraints and claim that the trade-off between model efficiency and fairness has to be made by relaxing these constraints and finding the solution that can reach satisfactory fair results.

In-processing techniques can be of help in fair MCDM. One can find statistical parity as a constraint [52], as in Eq. (4).

$\displaystyle SP=\frac{1}{|D|}\sum_{i\in D}u_{i}-\frac{1}{|P|}\sum_{i\in P}u_{% i}+c,$ (4)

where $D$ presents the set of individuals that belong to the disadvantaged group ( $D\subset\{A|s=1\}$ ), $P$ set of individuals that belong to the privileged group ( $P\subset\{A|s=0\}$ ), $u$ score obtained using aggregation function, for example, $w^{T}D$ for the disadvantaged group or $w^{T}P$ for the privileged group. Finally, $c$ presents a hyper-parameter that controls discrimination. If $c=0$ then perfect fairness is required, while $c<0$ allows a greater average score of the privileged group for parameter $c$ . Similar adaptations can be found in [36].

There are other in-processing approaches, one of which is [55] where learning to rank is used to accommodate for fairness in top- $k$ rankings. It is a supervised approach focused on the top position individuals and it minimizes cross-entropy with exposure being the fairness metric. A similar problem is solved using policy learning in [41]. These approaches are not directly comparable with the approach we propose, since they present learning to rank machine learning branch, while the proposed approach is an example of multiple criteria decision-making. In other words, we are focused on the fairness in the entire set of individuals, while these approaches focus on the top- $k$ individuals.

Finally, post-processing of the scores can be applied as well. These approaches adjust the scores obtained by a decision-making method. Therefore, the decision-maker can derive scores, observe if unfairness exists, and correct them to be fair. One such approach is FA*IR algorithm [54]. As the input, FA*IR algorithm takes the first $k$ ranking to be returned, the vector of criteria values $a_{i}$ , the sensitive attribute $s_{i}$ , the minimum proportion $p$ of disadvantaged individuals, and the adjusted significance level $\alpha$ . FA*IR algorithm calculates scores $u_{i}$ for each individual $i$ . Then, it calculates the expected number of individuals from the disadvantaged group in the top $k$ with a proportion of $p$ and statistical significance $\alpha$ . Then, it MOŽDA gradually greedily constructs a ranking using the scores and the minimum expected number of individuals from the disadvantaged group for each iteration. These two are combined in the ranking procedure. More specifically, FA*IR selects top $k$ individuals by selecting either the best overall ranked individual or the best-ranked individual from the disadvantaged group (even if the score is lower). The latter happens if the minimum expected number of individuals from the disadvantaged group for that iteration is not satisfied. This procedure guarantees in-group monotonicity and group fairness. However, an individual with a lower score can be ranked better than the individual with a higher score (i.e. an individual from the disadvantaged group is selected to satisfy minimum proportion $p$ ).

Another paper worth mentioning is [56]. It interpolates between the views what you see is what you get and we are all equal, which maps to individual fairness and anti-discrimination law (statistical parity), respectively. This is done with a hyper-parameter using the optimal transport approach. The method scales well with a large datasets where both privileged and disadvantaged groups are well represented. For a more detailed discussion on fairness in machine learning based ranking methods, we refer to [57], and for a more detailed discussion on fairness methods in machine learning we refer to [7].

The contributions of this paper are:

Incorporation of group fairness constraints that satisfies disparate impact (statistical parity),

Satisfaction of the individual fairness, as we defined it, and

The proposed model results in interpretable solution.

Group fairness is based on the $D I$ metric, found in [12]. This means that we aim at satisfying a certain level of $D I$ using hyper-parameter $c$ as in Eq. (4). In order to have a guarantee for group fairness, we introduce the group fairness constraint in the mathematical model. Therefore, if there is a feasible solution, the proposed method would find one and return it to the decision-maker.

Since individual and group fairness imply different points of view, i.e. satisfying one can violate the other [56], we find it necessary to make a compromise. This compromise is defined in the optimization model we propose. More specifically, we propose adding an additional constraint regarding the individual fairness.

Finally, due to importance of the model interpretation, we adopt linear decision-making model. Linear models can be optimized efficiently and have the advantage of dual model analysis. To the best of our knowledge, this is seldom done in the area of fair machine learning.

We incorporate both group fairness and individual fairness requirements using the post-processing technique for changing weights of criteria and consequently changing scores. More specifically, we derive a linear mathematical model with a goal function of minimal change in criteria weights subject to a fairness constraint. The goal function generates new weights to be as close as the initial ones, thus forcing individual fairness. On the other hand, the statistical parity constraint does not allow privileged group individuals to have much higher scores. The initial criteria weights are assigned by decision-makers directly (or indirectly).

In contrast to [54] we adjust criteria weights, thus making the decision model interpretable. FA*IR performs as a closed-box algorithm where scores are inserted and rankings are produced. The decision-maker is not intuitively shown how the rankings are obtained.

The flexibility of our mathematical model allows the use of other individual or group fairness metrics, which makes it more favorable to apply. Moreover, while the models available in literature change only final ranks (such as [54], and [55]), the FairAW changes scores (through changing weights) that affect the ranks as well.

3. Methodology

This section consists of four parts. First, we motivate the mathematical model. Then, we provide the process of the mathematical model derivation. This part of the methodology links justice theory and the proposed mathematical model by a step-by-step explanation of the goal function and individual and group fairness constraints. Then, we briefly explain the data used in this research. The mathematical model was tested on three datasets with a different number of alternatives (individuals) and criteria. Finally, we describe the experimental setup.

3.1 Motivation for the FairAW

In classical decision theory, some of the tasks (that this paper aims at) are finding the score of alternatives (individuals), and consequently ranking. These are performed by aggregating criteria weights and values of individuals. For this purpose, commonly-used MCDM methods, such as Simple Additive Weighting (SAW) can be exploited [47].

However, with the development of the theory of social justice, fairness, and equality, there is a rising need to create fair decision-making methods [31]. In order to achieve fair MCDM models, one first needs to define fairness. Commonly, MCDM methods employ the utilitarianism principle. More specifically, a morally right course of action is the one that produces the maximum benefit for the decision-maker [26]. As a consequence of such an approach, one tries to maximize utility. While some find utilitarianism a very popular framework for decision-making, it fails to take into account the considerations of fairness. Although decisions produce greater benefits for the system, they can be very unfair towards a specific subgroup of people. Today, the examples of cultural and historical biases exist in science, technology, engineering, and mathematics education. Male students are dominant in enrolling for software engineering education [14]. Another example is COMPAS [10]. By observing the decision-making criteria, African Americans are obtaining higher scores (and thus sentenced). However, this is arguably due to the bias that “young and black” are criminals [23]. Further, algorithms are known to amplify the existing biases [29], and although unwanted discrimination cannot be eliminated, it can be reduced to a satisfactory level. Utilitarianism does not take into account historical and existing biases, and if one wants to act with affirmative actions and mitigate the possibility that the decision-making process has a disparate impact, utilitarianism should not be the sole principle.

In order to introduce fairness equality of outcomes or equality of opportunity have to be taken into account. Equality of outcomes assumes that all individuals are equal (we are all equal), and consequently should have the same outcome. This means that personal characteristics of an individual, either inherited or appropriated, should be independent of the outcome, both directly and indirectly. This point of view on equality corresponds to the egalitarian point of view on fairness. Equal opportunity, in the original definition, requires that individuals with the same set of skills (or scores in the decision-making terminology) have the same opportunities regardless of their inherited or appropriated characteristics. In terms of algorithmic decision-making percentage of those getting the desired outcome should be approximately the same. This point of view on equality corresponds to the Rawlsian point of view on fairness [5].

However, it is not reasonable to expect that scores are the same for both genders (or for every race, religion, etc.). Some small discrimination is expected and allowed. Therefore, the common fairness threshold is “80% rule” which states that allowable disparate impact is such that disadvantaged group should achieve at least 80% of the privileged group’s expected score in the same population [12].

3.2 FairAW

Our idea is to incorporate both utilitarianism and fairness into the FairAW method. Utilitarianism aims to satisfy that the scores are as the one given by the decision-maker, while fairness ensures that unwanted discrimination does not occur.

Weights present preferences of decision-makers regarding criteria. Therefore, weights show decision-makers’ expert opinion on how influential criteria is in decision-making. If the initial weights given by the decision-maker result in disparate impact, then this decision-making process can be subject to legal consequences [4]. To mitigate this issue, the weights can be changed so that legal constraints are satisfied. If weights are changed, the resulting scores should not differ much from the original. This is aligned with the notion of individual fairness [11]. More specifically, the model should be constrained to obtain similar scores as the original one, or at most $\tau$ different. Value of $\tau$ can be selected in either absolute terms (i.e. score can differ by 0.1) or in relative terms (i.e. at most 10% different than the original score). Since the relative difference can be overly rigid (e.g. 10% difference for the score 0.02 practically does not allow a change in scores for that alternative, while 10% difference for the score 0.9 allows a large change in score values), we employ the absolute difference (Eq. (5)).

$\displaystyle|u^{\textit{new}}-u^{\textit{old}}|\leqslant\tau$ (5)

Table 1

Decision matrix

	$C_{1}$	$C_{2}$	…	$C_{n}$
$A_{1}$	$a_{1,1}$	$a_{1,2}$	…	$a_{1,n}$
$A_{2}$	$a_{2,1}$	$a_{2,2}$	…	$a_{2,n}$
…	…	…	…	…
$A_{m}$	$a_{m,1}$	$a_{m,2}$	…	$a_{m,n}$
	$w_{1}$	$w_{2}$	…	$w_{n}$

We chose simple additive weighting (SAW) since it is widely used [47] and the fact that many MCDM methods use additive weighting as an integral part of utility calculation. Thus, the proposed method can be used in different MCDM methods (such as AHP, VIKOR, etc.) with minor (or major) adjustments. Although the application of the proposed method is not limited to SAW only, and other MCDM methods can be used as well, this paper focuses solely on SAW. The basic concept of the SAW method is to find the score for each alternative using the weighted sum. A decision matrix is given in Table 1: where $C={C_{1},C_{2},\ldots,C_{n}}$ presents a set of criteria, $A={A_{1},A_{2},\ldots,A_{m}}$ a set of individuals (or alternatives in terms of MCDM), and $a_{i,j}$ the value of individual $i$ for criterion $j$ . Also, a weight associated with criterion $j$ is denoted $w_{j}$ . It is worth mentioning that the sum of weights should be one. Before calculating the scores, the decision matrix needs to be prepared. First, every criterion should be presented as benefit criteria (higher value is better). If criteria are presented as a cost criterion, it can be converted to benefit criteria using the inverse value. Second, every criterion must be on the same scale. In the original SAW, one employs $L_{\infty}$ normalization (as presented in Eq. (6)):

$\displaystyle a_{i,j}^{N}=\frac{a_{i,j}}{\max_{j}a_{i,j}}$ (6)

where $a_{i,j}^{N}$ denotes the normalized value of the individual $i$ for criterion $j$ , and $\max_{j}a_{i,j}$ the maximum value that for criterion $j$ . More specifically, each value in the decision matrix is divided by the maximum value of that criterion. By doing data normalization, we made every criterion comparable. Values are between zero and one, with the best value equal to one. The resulting decision matrix is shown in Table 2.

Table 2

Normalaized decision matrix

	$C_{1}$	$C_{2}$	…	$C_{n}$	Score
$A_{1}$	$a_{1,1}^{N}$	$a_{1,2}^{N}$	…	$a_{1,n}^{N}$	$u_{1}$
$A_{2}$	$a_{2,1}^{N}$	$a_{2,2}^{N}$	…	$a_{2,n}^{N}$	$u_{2}$
…	…	…	…	…
$A_{m}$	$a_{m,1}^{N}$	$a_{m,2}^{N}$	…	$a_{m,n}^{N}$	$u_{m}$
	$w_{1}$	$w_{2}$	…	$w_{n}$

Having the data prepared, the score $u$ presents the weighted sum as presented in Eq. (7):

$\displaystyle u_{i}=\sum_{j=1}^{n}w_{j}a^{N}_{i,j}$ (7)

where $w_{j}$ presents the weight of criterion $j$ , and $a_{i,j}^{N}$ presents the normalized value of individual $i$ for criterion $j$ . Finally, scores $u_{i}$ are ranked in descending order, and presented to the decision-maker. We propose a mathematical model for achieving individual and group fairness. Initially, the mathematical model is defined (Eq. (3.2)):

$\displaystyle\min\sum_{i=1}^{m}|u_{i}-u^{\textit{old}}_{i}|=\sum_{i=1}^{m}\sum% _{j=1}^{n}|w_{j}-w_{j}^{\textit{old}}|a^{N}_{i,j}$ $\displaystyle\quad=\sum_{j=1}^{n}|w_{j}-w_{j}^{\textit{old}}|\sum_{i=1}^{m}a^{% N}_{i,j}=\sum_{j=1}^{n}|w_{j}-w_{j}^{\textit{old}}|A_{j}$ $\displaystyle s.t.$ $\displaystyle\sum_{j=1}^{n}w_{j}=1$ (8) $\displaystyle\frac{1}{|D|}\sum_{i\in D}\sum_{j=1}^{n}w_{j}a_{i,j}^{N}-c\frac{1% }{|P|}\sum_{i\in P}\sum_{j=1}^{n}w_{j}a_{i,j}^{N}\geqslant 0$ $\displaystyle\left\lvert\sum_{j=1}^{n}w_{j}a_{i,j}^{N}-\sum_{j=1}^{n}w_{j}^{% \textit{old}}a_{i,j}^{N}\right\rvert\leqslant\tau,i=1,\ldots,m$ $\displaystyle w_{j}\geqslant 0,j=1,\ldots,n$

The intuition is as follows. We would like to create a decision model that is as similar to the original one as possible, thus satisfying the initial decision-maker’s judgement. This is achieved by imposing similar weights ( $w$ presents new weights, while $w^{\textit{old}}$ presents original weights provided by a decision-maker). Thus, the goal function searches for the smallest change in the criteria weights. A smaller difference in weights provokes similar scores, thus a high value of individual fairness. However, this does not provide a guarantee that the scores are similar to the original one. To provide a guarantee, we introduce constraints. First, the sum of weights must be equal to one (or weights are distributed along with criteria such they present 100% of the decision). Additionally, we set weights to be positive numbers.

The second constraint introduces group fairness (Eq. (3.2)). We employ statistical parity, which is presented in the Eq. (2). More specifically, the difference of the expected scores between the disadvantaged ( $D\subset A|s=1$ ) and privileged groups ( $P\subset A|s=0$ ) is bound with a hyper-parameter $c$ . For example, if decision-makers want to create a fair decision model on the group level based on the “80% rule” [12], then they would set $c=0.8$ . However, decision-makers can use different disparate impact levels if needed.

Finally, a constraint was added (the third one) where the absolute difference between the new score and the original score is lower or equal to $\tau$ . This constraint is evaluated for every individual $i$ . Therefore, there are $m$ constraints regarding individual fairness.

The initial model was further modified due to the presence of absolute values in the goal function and constraints. More specifically, the mitigation of absolute values from the mathematical model requires a transformation and is presented in Eq. (3.2).

$\displaystyle\min\sum_{j=1}^{n}A_{j}\gamma_{j}$ $\displaystyle s.t.$ $\displaystyle\sum_{j=1}^{n}w_{j}=1$ $\displaystyle w_{j}-\gamma_{j}\leqslant w_{j}^{\textit{old}},j=1,\ldots,n$ $\displaystyle-w_{j}-\gamma_{j}\leqslant-w_{j}^{\textit{old}},j=1,\ldots,n$ (9) $\displaystyle\frac{1}{|D|}\sum_{i\in D}\sum_{j=1}^{n}w_{j}a_{i,j}^{N}-c\frac{1% }{|P|}\sum_{i\in P}\sum_{j=1}^{n}w_{j}a_{i,j}^{N}\geqslant 0$ $\displaystyle\sum_{j=1}^{n}w_{j}a_{i,j}^{N}-\sum_{j=1}^{n}w_{j}^{\textit{old}}% a_{i,j}^{N}\leqslant\tau,i=1,\ldots,m$ $\displaystyle-\sum_{j=1}^{n}w_{j}a_{i,j}^{N}+\sum_{j=1}^{n}w_{j}^{\textit{old}% }a_{i,j}^{N}\leqslant\tau,i=1,\ldots,m$ $\displaystyle w_{j}\geqslant 0,j=1,\ldots,n$ $\displaystyle\gamma_{j}\geqslant 0,j=1,\ldots,n$

where $\gamma$ presents new variables that replace the variables of interest $w$ using $w-w^{\textit{old}}\leqslant\gamma$ and $-w+w^{\textit{old}}\leqslant\gamma$ (second and third constraint). Additionally, the absolute difference in scores is divided into two constraints (fifth and sixth constraints). This model is completely linear, thus can be solved using linear optimization procedures such as Simplex or Interior point method.

We are aware that other mathematical formulations for the goal function can be used. For example, instead of an absolute difference between new and original criteria weights, quadratic difference or any other difference function can be used. In Appendix A, we propose an additional mathematical model that solves a similar problem (both group and individual fair decision-making model) using quadratic optimization.

Linear formulation of the problem at hand allows us to get a better insight into the sources of unfairness. More specifically, we can answer the questions such as: Which criteria are unfair? Can we increase (or decrease) a weight? What individuals are bounded by fairness constraints (either individual or group fairness constraints)? These questions are answered by solving a dual mathematical model. It is worth stating that the duality gap in linear programming is certainly zero, thus the answers obtained by solving dual mathematical model are certainly right in their interpretation. The dual mathematical model is presented in Appendix B.

To summarize, the proposed method applies linear programming to adjust criteria weights such that:

statistical parity is satisfied (and consequently disparate impact), thus making the model satisfy group fairness, and

the difference between initial scores and the new scores is small (constrained with parameter $\tau$ ), making the model individually fair. In addition, individual fairness is inserted into a goal function indirectly by stating that the difference between original and new weights should be minimized.

Due to constrained optimization, there is a guarantee that the obtained solution satisfies both notions of fairness if a feasible solution exists. In addition, the proposed method can be used to inspect what criteria, as well as what individuals are creating unfairness in scores.

We argue that the proposed method can be used in the situations where fairness is a hard constraint, or when the decision-maker wants to alleviate legal consequences. Even in applications where fairness is not a necessity, decision-makers can use the proposed method to create positive discrimination, e.g. affirmative actions. However, newly obtained criteria weights and scores should be returned to the decision-maker before making the final decision. If the decision-maker is satisfied with the proposed solution, the decision can be made. If not, the decision-maker can adjust criteria weights to match his/her preferences.

The proposed methodology utilizes $L_{\infty}$ norm as defined in the original SAW method. Without the loss of generality of the proposed method, other linear norms can be used ( $L_{1}$ , $L_{2}$ , $\max-\min$ , etc.). This, however, does not mean that the results should be the same regardless of the normalization employed. It is known from the literature that the choice of normalization can affect the final results. This is known as the rank reversal problem [46, 13] and it is out of the scope of the paper. We note that the proposed methodology would not change since the normalization of data is a linear transformation (or in the general case, affine transformation) of the data at hand, and as such does not affect the optimization procedure. More specifically, as it is presented in Eq. (3.2) the absolute difference between the original weight $j$ and the new weight $j$ is multiplied with the sum of normalized values of $j$ criteria, thus acting as a constant in the mathematical model. Therefore, the mathematical model is the same regardless of the normalization being used. The results, however, may differ. This is due to different properties that each normalization has. Finally, the proposed method seeks for the setup of weights that satisfies the given constraints. Since the proposed mathematical model aims at reducing the absolute difference between new and original weights, the optimization procedure seeks for a solution where the smallest number of weights is adjusted (similar to lasso penalization in linear regression).

3.3 Datasets

The datasets used in this research correspond to real-life decision-making scenarios, where individuals are described using a sensitive attribute and qualities on a numerical scale (criteria). We must remark that the choice of the privileged group and the disadvantaged group is determined by legal documents, such as the constitution, laws, or voluntary commitment. To evaluate the proposed FairAW method, we test different scenarios (different choices of a sensitive attribute in different datasets), but in a real application there is no ambiguity about what the disadvantaged group is and what the allowable disparate impact (statistical parity) is.

We used four datasets. Two datasets are synthetically generated, and the remaining two are publicly available real-world examples. We synthetically generated a dataset regarding student college placements, while student performance data [21] presents synthetically generated data regarding scores on the test. The publicly available datasets are COMPAS [1] and LSAT [56].

Student placement data consists of 215 students. Students are described using nine criteria: secondary school final exam points, high school final exam points, a numerical adaptation of high school specialization, bachelor degree score, a numerical adaptation of degree title, work experience, a numerical adaptation of specialization, the number of points obtained on the admission test, and the number of points obtained at interview. The dataset was intentionally generated so that a disparate impact in scores exists. More specifically, gender score disparity can be observed, where male students performed better than female students.

Student performance presents a dataset about scores obtained at a standard admission test. The dataset consists of three criteria. Namely, mathematics score, reading score, and writing score. Gender is selected as a sensitive attribute. There are 1,000 students in total, and the dataset is almost completely fair. More specifically, a disparate impact in scores that is observed in the data suggests that male students are performing slightly better than female students.

COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) is a decision support system that provides a signal to the decision-maker (i.e. judge) about the potential recidivism of an offender. Originally, the questionnaire consists of 137 questions based on which predictions are made. It is used in several jurisdictions in the US, but it has been a subject of potential racial discrimination. More specifically, it is indicated that COMPAS produces a higher probability of recidivism for African-Americans [4]. In our experiment, we filtered the dataset to contain only African-American and White-Caucasian individuals. As a result, the dataset consists of 6,150 individuals (accounts for $\sim$ 85% of all individuals). Further, since the majority of answers in the questionnaire were categorical, we selected the most informative numerical information that can be used for deciding on recidivism used in different papers [10] including age, number of juvenile crimes, decile score from the COMPAS, number of juvenile misdemeanor crimes, number of prior crimes, and days of screening before an arrest.

LSAT (Law School Admission Test) is a dataset used initially for the study of differential bar exam passage rates between different ethnic groups. The concern is that admission policies favor the majority group (White-Caucasian students). The dataset contains 21,792 students and is described by using three attributes. Those are law school admission test scores, the undergraduate grade-point average, and a standardized score of average grade at the end of the first year of law school. A major disparity exists between White-Caucasian students and African-American students [56]. However, we consider White-Caucasian students as a privileged group individuals, and other ethnic groups as a disadvantaged group.

Both COMPAS and LSAT are commonly used to test for unfairness in supervised algorithmic decision making and ranking. It is worth stating that none of the datasets have any missing values or visible outliers.

3.4 Experimental setup

For each of the previously described datasets, we tested the proposed FairAW method and compared the results with baseline methods. The research question we wanted to solve is: Can we reach a satisfactory fair solution? A satisfactory fair solution requires from our model to reach a solution that is both group and individually fair given the data at hand. Having this in mind, our hypothesis is that the proposed approach finds a solution that is both group and individually fair. Since the proposed method solves the optimization problem, the proposed approach would yield a solution that satisfies both notions of fairness if such a solution exists. The results are compared with:

Baseline 1: Fairness unaware scores. The scoring procedure that utilizes SAW. More specifically, we generated scores using only the values of alternatives for criteria.

Baseline 2. Disparate Impact Remover (DIR) [12]. Instead of scores, data can be adjusted to be fair. DIR shifts data distribution for both privileged and disadvantaged groups for each criterion separately. This is done in such a manner that they represent the same distribution. The implementation we used in this research completely removes disparate impact in the data. Therefore, the resulting data is said to be fair. We then applied SAW on the corrected data.

Baseline 3: FA*IR [54]. This procedure adjusts rankings of individuals by using a statistical test for calculating the expected number of individuals from a disadvantaged group in the top $k$ ranks. This is done by setting the parameter $p$ that represents the percentage of individuals from the disadvantaged group in the top $k$ ranks. It works in iterations, thus at each iteration $t$ this procedure selects the best ranked individual from the disadvantaged group if the number of selected individuals from the disadvantaged group at iteration $t$ is less than the expected number. As a result, a new ranking of the individuals is obtained, but the scores remain the same. Due to constraints in the FA*IR algorithm for selection of parameter $k$ , we set this parameter to the lower of the number of rows in the dataset or 400.

Fairness unaware scores represent a situation where a decision-maker acts ignorant about the potential unfairness or disparities in the decision-making. Therefore, this approach acts as a utilitarian baseline. If there are disparities in the underlying data, those disparities are reflected in the scores that are obtained. Another assumption that is common in the algorithmic decision-making community is that fair data should lead to fair decisions [12]. Effort can be invested in “cleaning” the data from disparities. The approach to remove the disparities in data is DIR. We use DIR as a benchmark for that reason. Finally, we use FA*IR as an approach that belongs to the same family of unfairness mitigation approaches. More specifically, once the scores are obtained, they (or in the case of FA*IR ranks) can be adjusted to obtain fair results. Due to the nature of the DIR algorithm, we assume that DIR approach would generate group fair results. However, it is highly probable that individual fairness would be violated. This is expected since DIR is designed to correct the input data to be group fair, not to generate individually fair results. On the other hand, as FA*IR is a statistical approach that does not alter scores, one can obtain group fair results, but without a proper explanation. This is due to changing the ranking of alternatives while all alternatives are having the same score as prior to the procedure.

Measures. To test if the proposed methodology performed well in terms of group and individual fairness, we report several measures. The first group of measures is regarded as changes in scores. More specifically, we report average difference in scores overall ( $\widehat{\Delta u}$ ), as well as for the privileged ( $\widehat{\Delta u^{P}}$ ) and disadvantaged group ( $\widehat{\Delta u^{D}}$ ) separately, calculated as in the Eq. (3.4).

$\displaystyle\widehat{\Delta u}=\frac{1}{m}\sum_{i=1}^{m}(u_{i}^{\textit{new}}% -u_{i}^{\textit{old}})$ $\displaystyle\widehat{\Delta u^{P}}=\frac{1}{|P|}\sum_{i\in P}(u_{i}^{\textit{% new}}-u_{i}^{\textit{old}})$ (10) $\displaystyle\widehat{\Delta u^{D}}=\frac{1}{|D|}\sum_{i\in D}(u_{i}^{\textit{% new}}-u_{i}^{\textit{old}})$

where $u_{j}^{\textit{new}}$ presents the score of an individual $j$ obtained using new weights $w$ (for FairAW method) or using new values $a_{i,j}$ (for Disparate Impact Remover).

Further, we report a maximal and minimal change in scores, presented in the Eq. (3.4):

$\displaystyle\max(\Delta u)=\max(u_{i}^{\textit{new}}-u_{i}^{\textit{old}})$ (11) $\displaystyle\min(\Delta u)=\min(u_{i}^{\textit{new}}-u_{i}^{\textit{old}})$

These measures indicate the level of individual fairness, as the definition of the individual fairness adopted in this paper is the deviation from the original scores. Besides changes in scores, we report changes in rankings. We interpret ranking measures with caution because the proposed methodology is not aimed at ranking directly. They are presented as FA*IR which is aimed to improve fairness in ranking, and we use it to complement the previously described individual fairness metrics. Since we would like to see similar ranking scores for both privileged and disadvantaged groups, we report an average rank change for both privileged ( $\widehat{\Delta R^{P}}$ ) and disadvantaged group ( $\widehat{\Delta R^{D}}$ ) using Eq. (3.4).

$\displaystyle\widehat{\Delta R^{P}}=\frac{1}{|P|}\sum_{i=1}^{P}(R_{i}^{\textit% {new}}-R_{i}^{\textit{old}})$ (12) $\displaystyle\widehat{\Delta R^{D}}=\frac{1}{|D|}\sum_{i=1}^{D}(R_{i}^{\textit% {new}}-R_{i}^{\textit{old}})$

However, average rank change for both the privileged and disadvantaged group changes due to intervention in the optimization. Therefore, we calculate the difference in average ranks (where $R_{i}$ denotes the rank of an alternative $i$ ) between the disadvantaged and privileged group using the Eq. (13).

$\displaystyle R^{\textit{diff}}=\widehat{R^{d}}-\widehat{R^{p}}=\frac{1}{|D|}% \sum_{i\in D}R_{i}-\frac{1}{|P|}\sum_{i\in P}R_{i}$ (13)

Since the goal of the paper is to improve fairness in scores, we calculate disparate impact ( $D I$ ) and statistical parity ( $S P$ ), as presented in the Eqs (1) and (2), respectively.

Setup. For each dataset, due to different levels of initial unwanted discrimination, a different level of fairness is required. Therefore, for the Student Performance dataset we required disparate impact to be $DI=0.8$ . To achieve this level of disparate impact, FairAW hyper-parameter $c$ is set to 0.8. An additional parameter $\tau$ is needed for FairAW. This value represents the allowable difference in scores or the allowable level of individual fairness. The authors believe that scores should not differ by a value greater than 0.1. Since $L_{\infty}$ norm is used, the score is interpreted as a difference to the ideal possible solution. In other words, we do not allow the score to change for more than 10% in comparison to the ideal solution. Since it is a hyper-parameter, a different value for tau can be selected. The parameter $p$ in FA*IR that presents the proportion of discriminated alternatives in top $k$ is set to 0.4 (equal to $DI=0.8$ ). FA*IR has parameter $k$ , which in this dataset covers the entire dataset. More specifically, $k=215$ .

The second dataset Student performance had almost no discrimination in the data. Therefore, we wanted to achieve perfect disparate impact. For FairAW we set $c=1$ and $\tau=0.1$ , while for FA*IR we set $p=0.5$ and $k=400$ .

Finally, for the COMPAS and LSAT datasets we wanted to achieve disparate impact of 0.9. To achieve this, we set for FairAW $c=0.9$ and $\tau=0.1$ , while for FA*IR we set $p=0.45$ and $k=400$ .

For each dataset, we used the same strategy for selecting the weight vector $w$ . We used a uniform setup of weights $w$ . More specifically, $w_{j}=\frac{1}{n},j=1,\ldots,n$ . We are aware that the uniform setup of weights is a downside of the paper. However, this setup is sufficient to show that the proposed method works.

4. Results and discussion

The performance of the proposed FairAW method and other methods used for comparison is divided by datasets. Therefore, we first present the performance on the Student placement dataset, followed by Student Performance, COMPAS and LSAT datasets.

Student Placement. This dataset has a very high initial unwanted discrimination (which can be observed in the results). The results are presented in Tables 3 and 4. The best performances in measures regarding rankings and fairness are presented in bold letters.

Table 3
Student Placement Results ( $n=215$ ) – utility change

Algorithm	$\widehat{\Delta u}$	$\widehat{\Delta u^{P}}$	$\widehat{\Delta u^{D}}$	$\max(\Delta u)$	$\min(\Delta u)$
SAW	–	–	–	–	–
DIR	$-$ 0.058	$-$ 0.206	0.023	0.262	$-$ 0.342
FA*IR	0	0	0	0	0
FairAW	0.037	$-$ 0.011	0.063	0.100	$-$ 0.079

Table 4

Student Placement Results ( $n=215$ ) – rank change and fairness

Algorithm	$R^{\textit{diff}}$	$\widehat{\Delta R^{P}}$	$\widehat{\Delta R^{D}}$	$D I$	$S P$
SAW	91.34	–	–	0.693	$-$ 0.204
DIR	9.09	64.93	$-$ 35.50	1.054	0.025
FA*IR	70.24	7.11	$-$ 3.88	0.766	$-$ 0.168
FairAW	8.70	61.11	$-$ 41.54	0.800	$-$ 0.131

The dataset initially had a high level of group unfairness with $DI=0.693$ . This level is below the threshold given by the “80% rule”. It can be observed from the data that each of the methods improved the fairness in scores and rankings. Algorithm DIR corrected the data to be perfectly fair, which helped the SAW procedure to achieve a fair model. After data preprocessing, the values of $D I$ and $S P$ are the closest to the perfect value – namely, one and zero. However, the cost of achieving a perfectly fair decision model is a high change in scores (individual fairness). On average, the privileged individuals have their score reduced by 0.206. Also, the minimum difference (the greatest reduction in score) is 0.342. This indicates that the privileged group individuals greatly reduced their scores just to achieve fairness. More specifically, the cost of group fairness is individual fairness.

The FA*IR algorithm did not change the scores, but it ranked the individuals differently. Disparate impact and statistical parity did increase to an approximately satisfactory solution. However, the interpretability of the model is very low. The decision-maker should expect envy from the privileged group individuals, i.e. individuals with a lower score are ranked better than the individuals with a higher score.

Our approach satisfied the intuition we set. More specifically, we wanted to satisfy both individual and group fairness. Individual fairness measures show that the average overall difference in scores is just 0.037. Further, the privileged group individuals have their scores reduced by 0.011 on average, while disadvantaged group individuals have their scores increased by 0.063 on average. However, it can be observed that individual fairness is an active constraint since the maximum increase achieved with the new weights is equal to the level of tau. In addition, group fairness is satisfied as well. By inspecting $D I$ and $S P$ , it can be observed that new weights achieved a lower bound of $D I$ constraint. Therefore, both individual and group fairness are satisfied on the satisfactory level.

We can observe that there are differences in ranking as well. Although our approach FairAW has the best $R^{\textit{diff}}$ , DIR reduced the difference in average ranks (from $\sim$ 91 to $\sim$ 9). Also, privileged group individuals have worse average rank, while disadvantaged group individuals have better average rank.

Our approach has an additional advantage. Since weights are changed in such manner as to satisfy both individual and group fairness, we can observe which criteria were unfair and how the change in weights affected individual and group fairness. The latter can be observed by inspecting dual variables as presented in Appendix B. For this dataset, we observe that six criteria have disparities in values (which is exactly the number we implemented during the creation of the dataset). The weights from five of six criteria were transferred to the remaining three (that have disparity toward female individuals). However, none of the criteria had a weight equal to zero. The most discriminative criterion was the one that was intentionally created to have the most gender disparities – the number of points obtained at the admission test – and its weight is 0.011. It is worth mentioning that FairAW cannot achieve perfect fairness (i.e. $DI=1$ ) with the same level of individual fairness. The optimization procedure yields an infeasible solution. To mitigate this, one needs to lower individual fairness (i.e. increase parameter $\tau$ ).

Student Performance. This dataset has only three criteria, and it is very fair. Our idea was to make it completely fair and to see the cost of achieving fairness. The results are presented in Tables 5 and 6.

Table 5

Student Performance Results ( $n=1,000$ ) – utility change

Algorithm	$\widehat{\Delta u}$	$\widehat{\Delta u^{P}}$	$\widehat{\Delta u^{D}}$	$\max(\Delta u)$	$\min(\Delta u)$
SAW	–	–	–	–	–
DIR	$-$ 0.052	$-$ 0.048	$-$ 0.055	0.000	$-$ 0.173
FA*IR	0	0	0	0	0
FairAW	$-$ 0.005	$-$ 0.023	0.014	0.060	$-$ 0.073

Table 6

Student Performance Results ( $n=1,000$ ) – ranking change and fairness

Algorithm	$R^{\textit{diff}}$	$\widehat{\Delta R^{P}}$	$\widehat{\Delta R^{D}}$	$D I$	$S P$
SAW	79.73	–	–	0.946	$-$ 0.037
DIR	76.60	0.06	$-$ 0.07	0.932	$-$ 0.043
FA*IR	79.22	0.25	$-$ 0.26	0.947	$-$ 0.037
FairAW	4.92	36.06	$-$ 38.75	1.000	0.000

Although this dataset is relatively simple (only three criteria), our approach performed the best regarding the difference in scores and other fairness metrics as well. Our approach achieved perfect fairness as it was given as a constraint of the model. As a consequence, the ranking improved as well. The average difference in ranks between the disadvantaged and privileged group is just 4.90 (from an initial 79.73).

There is a reason why such results are obtained. During the creation of the dataset, disparate impact could not be observed directly because both means and median values for each criterion were similar, but the joint data distribution was different. Because of this, DIR did not correct the data significantly. As a result of the data correction, a slightly greater discrepancy between the privileged and disadvantaged groups was created. On the other hand, algorithm FA*IR used a statistical approach in selecting individuals from privileged and disadvantaged groups. More specifically, it selected either the best overall individual or an individual from the disadvantaged group if the expected number of individuals from the disadvantaged group had to be selected. Since the scores were fair, it adjusted the ranking procedure (thus $R^{\textit{diff}}$ , $\widehat{\Delta R^{P}}$ , and $\widehat{\Delta R^{D}}$ are large). However, the statistical significance of the difference in ranking did not force perfect fairness.

The proposed FairAW method achieved perfect group fairness with $DI=1$ and $SP=0$ , with a small cost in individual fairness. The scores changed on average for $-$ 0.0005, where the biggest difference was observed when one individual lowered the score for $-$ 0.073. This is a large drop in score value, but it is lower than predefined individual fairness constraint $\tau=0.1$ . This is achieved by increasing the weight of the reading score criterion for 0.278 (to 0.611), and reducing the score of the mathematics score criterion for 0.178 (to 0.155), while the writing score criterion weight was 0.234.

COMPAS. We present the results for this dataset in Tables 7 and 8. This dataset presents the difference in scores and rankings between African Americans and White Caucasians for potential recidivism. More specifically, White-Caucasian individuals have lower scores than African Americans. Since the goal of this decision support system is to help a judge for prison sentencing, it means that African Americans are more likely to be sentenced to jail. For presentation, $s=0$ will represent African Americans, while $s=1$ will represent White Caucasians. This means that African Americans are privileged to get a higher score for the unwanted decision, thus being discriminated in decision making.

Table 7

COMPAS results ( $n=6,150$ ) – utility change

Algorithm	$\widehat{\Delta u}$	$\widehat{\Delta u^{P}}$	$\widehat{\Delta u^{D}}$	$\max(\Delta u)$	$\min(\Delta u)$
SAW	–	–	–	–	–
DIR	$-$ 0.026	$-$ 0.026	$-$ 0.026	0.070	$-$ 0.060
FA*IR	0	0	0	0	0
FairAW	0.032	0.035	0.030	0.074	$-$ 0.032

Table 8

COMPAS results ( $n=6,150$ ) – ranking change and fairness

Algorithm	$R^{\textit{diff}}$	$\widehat{\Delta R^{P}}$	$\widehat{\Delta R^{D}}$	$D I$	$S P$
SAW	719.83	–	–	0.843	$-$ 0.021
DIR	673.30	$-$ 27.97	18.57	0.810	$-$ 0.021
FA*IR	519.18	$-$ 39.20	26.03	0.845	$-$ 0.021
FairAW	400.41	$-$ 131.86	87.56	0.900	$-$ 0.017

On the real-world dataset, our method outperformed both DIR and FA*IR in terms of fairness. Individual and group fairness constraints did not allow a higher difference in scores, both between the original scores and new scores, and between the scores of privileged and disadvantaged groups. If the changes in scores are, it can be noticed that the proposed approach stayed within the given limits ( $\max(\Delta u)$ and $\min(\Delta u)$ are in absolute values less than 0.1), thus making the scores individually fair. The ranking is also fairer. The difference in the average values of rank is much lower, and the individuals from the privileged group are ranked lower, while the individuals from the disadvantaged group are ranked higher. As a result of fair ranking, ranking fairness measures are better as well. Based on the dual and slack variables, the criteria that introduced unfairness in the scores can be observed. Those are number of juvenile crimes and number of juvenile misdemeanor crimes, where the weight is reduced in favor of age and days of screening before the arrest. Finally, the cost of achieving fairness can be observed by inspecting dual variable $\psi$ that has a value of $-$ 0.12. This means that the cost of a group fairness of 0.91 would be the change of the weights of criteria for additional 0.12 units.

LSAT. The results for this dataset in Tables 9 and 10. Again, the dataset regarded potential racial bias in the standard admission test. As observed in the results, White-Caucasian individuals have better scores in the additive weighting procedure resulting in a $DI=0.878$ and $SP=-0.085$ , meaning that White-Caucasian individuals have their scores greater for 0.085 compared to other ethnic groups on average.

Table 9

LSAT results ( $n=21,792$ ) – utility change

Algorithm	$\widehat{\Delta u}$	$\widehat{\Delta u^{P}}$	$\widehat{\Delta u^{D}}$	$\max(\Delta u)$	$\min(\Delta u)$
SAW	–	–	–	–	–
DIR	$-$ 0.065	$-$ 0.072	$-$ 0.026	0.031	$-$ 0.166
FA*IR	0	0	0	0	0
FairAW	0.059	0.058	0.059	0.100	$-$ 0.050

Table 10

LSAT results ( $n=21,792$ ) – ranking change and fairness

Algorithm	$R^{\textit{diff}}$	$\widehat{\Delta R^{P}}$	$\widehat{\Delta R^{D}}$	$D I$	$S P$
SAW	4994.04	–	–	0.878	$-$ 0.085
DIR	2343.43	415.26	$-$ 2235.35	0.927	$-$ 0.046
FA*IR	4970.73	36.50	$-$ 196.52	0.872	$-$ 0.090
FairAW	4516.92	74.75	$-$ 402.38	0.900	$-$ 0.082

After performing the fairness mitigation strategies, it can be observed that DIR obtained the best fairness in terms of rank differences, as well as fairness metrics $D I$ and $S P$ . This came with a cost of individual fairness as, on average, individuals reduced their scores by 0.065, and the greatest decrease in scores is 0.166. FA*IR did not manage to satisfy $D I$ and $S P$ on the entire dataset, which is the result of the constraint in the methodology. The proposed FairAW approach found a solution that increased the scores overall. In addition, it can be noticed that both group fairness and individual fairness constraints were active. More specifically, the biggest rise in score is equal to $\tau$ and $D I$ is 0.9. By inspecting the dual weights as well as the criteria weight, it can be observed that two criteria are creating a disparity in scores, while the remaining one equalizes them. The attribute that has the greatest disparity associated with is law standard admission test score, which had its weight reduced by more than 0.1. The criteria that compensated the aforementioned criteria is undergraduate grade-point average. More specifically, the disparity caused by admission tests is compensated with a grade-point average. Additionally, one can inspect the cost of achieving fairness by inspecting dual variable $\psi$ that has a value of $-$ 0.09. This means that the cost of group fairness of 0.91 would be the change of the weights of criteria for additional 0.09 units.

Discussion. FA*IR method utilizes a statistical approach to detect the number of individuals of the disadvantaged group in the top $k$ scores. For that purpose, it uses statistical significance, which can be very strict in determining the number of individuals from the disadvantaged group. We approach this problem from the optimization point of view and do not want to observe whether or not the discriminated individuals are represented in the top $k$ , but to equalize the expected scores. By doing this, the group fairness improves. In addition, FairAW has the power of interpretation, where dual variables can be inspected to see which one of them is active. This provides a signal to the decision-maker to inspect which individual is affected by individual fairness constraint, how the group fairness affects the scoring, and the change of criteria weights. However, our approach has lost one of the good properties of FA*IR. FA*IR can work with an arbitrary number of individuals, regardless of the dataset size. One just needs to select a parameter $k$ . Therefore, it will not consider the entire dataset, but the top $k$ only. Also, FA*IR preserves in-group monotonicity, while FairAW does not. More specifically, in-group monotonicity can be added as a set of constraints to FairAW, but the mathematical model would need to evaluate a large number of constraints – $|D|$ and $|P|$ additional constraints. Intuitively, it is questionable whether the feasible solution can be found in FairAW with in-group monotonicity constraints.

The usage of pre-processing techniques for decision-making is believed to have the greatest potential to mitigate unfairness in algorithmic decision-making [16]. However, by changing the data, the originality and accuracy of the data are also lost. Additionally, fair data can lead to unfair decisions [24]. For example, a specific combination of the criteria is unfair, and the decision-maker, accidentally or not, employs such practice.

Our approach, besides giving satisfactory results in terms of individual and group fairness, is interpretable. Decision-makers can see which criterion was unfair, and reduce the influence of that criterion in the decision-making process. They can even use the results of our method to examine the decision-making process, from data collection to weight estimation, and try to identify where the bias came from [15]. An additional benefit of our approach is in its mathematical formulation. Using linear programming allows us to inspect dual variables and see which alternatives contributed to unfairness. However, our approach does have limitations. First, it works on the entire dataset. Since weights are changed in the optimization procedure, each alternative is re-evaluated, thus getting a new score. Second, our procedure can yield an infeasible solution. If it is not possible to achieve the required level of group fairness with the parameter $c$ and individual level of fairness $\tau$ , then the procedure will result in an error. The decision-maker then needs to loosen the hyper-parameters. For example, if disparate impact is a hard constraint that needs to be satisfied, then $\tau$ must be higher. This way, individual fairness will have a lower satisfaction.

We highlight that the role of decision-maker remains the most important in the decision-making process. The decision-maker is responsible for designing a decision-making process by selecting the most important criteria, finding alternatives that can solve the problem at hand, as well as deriving utility functions (values in the decision matrix), and criteria weights. The role of domain knowledge in the proposed approach is of immense importance. Domain knowledge can help in guiding the process by proposing additional constraints in the optimization. For example, adding the upper and lower bounds on the criteria weights (either for weights in general or for a specific weight). A recommended way to use the proposed method is by utilizing the human-in-the-loop approach (as in [48]). The results of the proposed approach return: 1) adjusted weights that the decision-maker can interpret in terms of what criteria are becoming more/less important, or what criteria discriminate in terms of the sensitive attribute. Once the decision-maker becomes aware of the possible discrimination, he/she can adjust the decision-making process by eliminating the criteria that pose a problem, and 2) dual variables that help the decision-maker identify what alternatives are limiting the group or individual fairness. By removing this alternative from the decision matrix (with a proper non-discriminate explanation of why it is removed), the decision-making process could yield a fair solution.

5. Conclusions

This paper presents a FairAW method used to provide fair additive weighting by considering both group and individual fairness. We propose a linear mathematical model that aims at changing criteria weights such that statistical parity constraint and individual fairness constraints are satisfied. The goal of the proposed model is to change criteria weights by the smallest possible value (thus changing scores by the smallest possible value) so that both group and individual fairness are satisfied. Group fairness is presented with statistical parity, and it is controlled with a hyper-parameter. Similarly, individual fairness, which is defined as a difference between original scores and the new scores, is controlled with a hyper-parameter.

The results showed that the proposed approach performs well regardless of the shape of data, i.e. of how many criteria and individuals there are. The results were compared with the Disparate Impact Remover algorithm that corrects the data at hand to be fair, and FA*IR algorithm. Disparate Impact Remover managed to improve fairness on a group level, but at the cost of individual fairness. Compared to algorithm FA*IR, the proposed method performed better in terms of the required level of group fairness, but with a loss in individual fairness. This is due to the statistical nature of FA*IR and the constraint about the number of individuals for which the expected minimum number of individuals from the disadvantaged group should be in the top $k$ ranks. However, our approach does have limitations. It requires the entire dataset. Also, an optimization procedure can yield an infeasible solution. The latter can be mitigated by the relaxation of hyper-parameters regarding group and individual fairness.

As a part of future work, we identify the problem of multiple sensitive groups (i.e. intersectional fairness). If groups were vaguer and the direction and intensity of discrimination not known, then we would need to make our mathematical model more complex by adding multiple constraints, i.e. the comparison of each group and restricting from both lower and upper sides. However, one of the main challenges is to alter the decision-making algorithm so that the group fairness constraint is satisfied.

Another line of future work is aimed at implementing the proposed FairAW into other MCDM models, namely VIKOR [33] and AHP [39]. For VIKOR, we plan to define an interval guarantee solution that will present a pessimistic estimate of the score or rank of the individual. The similar mathematical model can be used for this task as well. However, enforcing fairness in VIKOR method is harder as compared to SAW. To estimate the interval guarantee solution of an individual, one needs to sort values, calculate acceptable advantage and acceptable strong solution, which is non-linear and even non-deferential [33]. In the case of AHP, enforcing FairAW method is slightly easier. Fairness could be enforced into pairwise comparison matrix. Individual fairness can be defined in the same manner as in FairAW, while group fairness can be presented as statistical parity. However, an additional level of complexity should be considered. It would be beneficial for the decision-maker to solve the possible inconsistency of the pairwise comparison matrix. While individual and group fairness can be solved using an efficient and tractable optimization tools, consistency cannot (both using maximum eigenvalue or Rayleigh coefficient are non-convex). Additionally, changing values in the pairwise comparison matrix present a discrete optimization, since the values in the pairwise matrix are limited to a predefined set of values ( $\{\frac{1}{9},\frac{1}{8},\ldots,\frac{1}{2},1,2,\ldots,8,9\}$ ).

Footnotes

Acknowledgments

This research was funded by Office of Naval Research grant number ONR N62909-19-1-2008, Project title: “Aggregating computational algorithms and human decision-making preferences in multi-agent settings”.

Appendix A. Quadratic model

Instead of the proposed loss function, one can use other loss functions. One such example can be a quadratic loss function. This formulation of the problem requires the usage of convex optimization methods, such as gradient descent [6]. The model is presented in Eq. (Appendix A. Quadratic model).

$\displaystyle\min\sum_{j=1}^{n}(w_{j}-w_{j}^{\textit{old}})^{2}$ $\displaystyle s.t.$ $\displaystyle\sum_{j=1}^{n}w_{j}=1$ (14) $\displaystyle\frac{1}{|D|}\sum_{i\in D}\sum_{j=1}^{n}w_{j}a_{i,j}^{N}-c\frac{1% }{|P|}\sum_{i\in P}\sum_{j=1}^{n}w_{j}a_{i,j}^{N}\geqslant 0$ $\displaystyle\left(\sum_{j=1}^{n}w_{j}a_{i,j}^{N}-\sum_{j=1}^{n}w_{j}^{old}a_{% i,j}^{N}\right)^{2}\leqslant\tau,i=1,\ldots,m$ $\displaystyle w_{j}\geqslant 0,j=1,\ldots,n$

This formulation corresponds to $L_{2}$ loss function, which is common in machine learning applications. It is intuitive and easy to optimize using gradient based optimization methods.

The one we presented in the paper corresponds to $L_{1}$ loss function. These models will yield similar solutions, but with one key difference. The quadratic model will aim to change all weights with small intensity, while the linear model will aim to change only some weights, but with higher intensity. In other words, the same intuition applies as with the coefficient of the linear regression when applied lasso ( $L_{1}$ ) regularization and ridge ( $L_{2}$ ) regularization [42].

Authors are aware that other loss functions can be applied, such as Huber loss or logarithm of the hyperbolic cosine [18].

Appendix B. Dual model

Linear programming model has the ability to easily define and interpret the dual model. Dual variables will enable better interpretability of the model and allow greater discussion of the decision-making model. The dual model is presented in Eq. (Appendix B. Dual model).

$\displaystyle\max\left(\mu-\sum_{j=1}^{n}w_{j}^{\textit{old}}\sigma_{j}^{l}+% \sum_{j=1}^{n}w_{j}^{\textit{old}}\sigma_{j}^{u}-\sum_{i=1}^{m}(\tau+u_{i}^{% \textit{old}})\upsilon_{i}^{l}+\sum_{i=1}^{m}(\tau-u_{i}^{\textit{old}})% \upsilon_{i}^{u}\right)$ $\displaystyle s.t.$ $\displaystyle-\sigma_{j}^{l}+\sigma_{j}^{u}\leqslant A_{j},j=1,\ldots,n$ (15) $\displaystyle\mu+\sigma_{j}^{l}+\sigma_{j}^{u}+\left(\frac{1}{|D|}\sum_{i\in D% }a_{i,j}^{N}-c\frac{1}{|P|}\sum_{i\in P}a_{i,j}^{N}\right)\psi$ $\displaystyle\quad{}+\sum_{i=1}^{m}a_{i,j}^{N}\upsilon_{i}^{l}+\sum_{i=1}^{m}a% _{i,j}^{N}\upsilon_{i}^{u}\leqslant 0,j=1,\ldots,n$ $\displaystyle\sigma_{j}^{l},\sigma_{j}^{u},\upsilon_{i}^{l},\upsilon_{i}^{u}\geqslant 0$

The dual model will have to maximize the dual variables corresponding to the weights equality constraint $\mu$ and summation regarding the absolute difference in weights $\sigma_{j}^{l}$ and $\sigma_{j}^{u}$ (namely, lower and upper slack of new weights), and summation for individual fairness constraints $\upsilon_{i}^{l}$ and $\upsilon_{i}^{u}$ (namely, lower and upper slack of individual fairness). We also have a dual variable $\psi$ that explains statistical parity. Interpretation of $\psi$ variable allows us to see how statistical parity increased the objective function. More specifically, $\psi$ presents the unit cost of meeting the statistical parity constraint. Larger values of $\psi$ indicate a higher cost of achieving group fairness. Values of $\upsilon_{i}^{l}$ and $\upsilon_{i}^{u}$ give the value of having an additional discrepancy between original and new scores.

There are two sets of constraints. The first one regards the transformation of the absolute value in the goal function, and thus dual variables are $\sigma_{j}^{l}$ and $\sigma_{j}^{u}$ . The second set of constraints is regarded as the entire mathematical model. More specifically, for each criterion the sum of dual variables for weights $\mu$ , dual variables obtained from the absolute values in the goal function $\sigma_{j}^{l}$ and $\sigma_{j}^{u}$ , statistical parity dual variable $\psi$ and dual variables for individual fairness $\upsilon_{i}^{l}$ and $\upsilon_{i}^{u}$ must be lower than zero.

References

Angwin

Larson

Mattu

and Kirchner

, Machine bias, ProPublica, May 23 (2016), 2016.

Asudeh

Jagadish

Stoyanovich

and Das

, Designing fair ranking schemes, in: Proceedings of the 2019 International Conference on Management of Data, 2019, pp. 1259–1276.

Biega

A.J.

Gummadi

K.P.

and Weikum

, Equity of attention: Amortizing individual fairness in rankings, in: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, 2018, pp. 405–414.

Binns

, Fairness in machine learning: Lessons from political philosophy, in: Conference on Fairness, Accountability and Transparency, 2018, pp. 149–159.

Bowers

and Robinson

P.H.

, Perceptions of fairness and justice: The shared aims & occasional conflicts of legitimacy and moral credibility, Wake Forest Law Review 47 (2012), 11–13.

Boyd

S.P.

and Vandenberghe

, Convex optimization, Cambridge university press, 2004.

Caton

and Haas

, Fairness in machine learning: A survey, arXiv preprint arXiv:2010.04053, 2020.

Corbett-Davies

Pierson

Feller

Goel

and Huq

, Algorithmic decision making and the cost of fairness, in: Proceedings of the 23rd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 2017, pp. 797–806.

Di Stasio

and Larsen

E.N.

, The racialized and gendered workplace: Applying an intersectional lens to a field experiment on hiring discrimination in five european labor markets, Social Psychology Quarterly 83(3) (2020), 229–250.

10.

Dressel

and Farid

, The accuracy, fairness, and limits of predicting recidivism, Science Advances 4(1) (2018), eaao5580.

11.

Dwork

Hardt

Pitassi

Reingold

and Zemel

, Fairness through awareness, in: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, 2012, pp. 214–226.

12.

Feldman

Friedler

S.A.

Moeller

Scheidegger

and Venkatasubramanian

, Certifying and removing disparate impact, in: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, pp. 259–268.

13.

García-Cascales

M.S.

and Lamata

M.T.

, On rank reversal and topsis method, Mathematical and Computer Modelling 56(5–6) (2012), 123–132.

14.

Greenberg

Bogaard

Jin

and Zhang

, Increasing female enrollment and retention for computing degrees, in: Proceedings of the 20th Annual SIG Conference on Information Technology Education, 2019, pp. 61–62.

15.

Grgić-Hlača

Zafar

M.B.

Gummadi

K.P.

and Weller

, Beyond distributive fairness in algorithmic decision making: Feature selection for procedurally fair learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, 2018.

16.

Heidari

Loi

Gummadi

K.P.

and Krause

, A moral framework for understanding fair ml through economic models of equality of opportunity, in: Proceedings of the Conference on Fairness, Accountability, and Transparency, 2019, pp. 181–190.

17.

and Chen

, A short-term intervention for long-term fairness in the labor market, in: Proceedings of the 2018 World Wide Web Conference, 2018, pp. 1389–1398.

18.

Janocha

and Czarnecki

W.M.

, On loss functions for deep neural networks in classification, arXiv preprint arXiv:1702.05659, 2017.

19.

Karam

Nagahi

Dayarathna

V.L.

Jaradat

and Hamilton

, Integrating systems thinking skills with multi-criteria decision-making technology to recruit employee candidates, Expert Systems with Applications, 2020, 113585.

20.

Kaya

Colak

and Terzi

, A comprehensive review of fuzzy multi criteria decision making methodologies for energy policy making, Energy Strategy Reviews 24 (2019), 207–228.

21.

Kimmons

, Students performance in exams, 2020.

22.

Lahoti

Gummadi

K.P.

and Weikum

, ifair: Learning individually fair data representations for algorithmic decision making, in: 2019 IEEE 35th International Conference on Data Engineering (ICDE), IEEE, 2019, pp. 1334–1345.

23.

Leiber

M.J.

and Johnson

J.D.

, Being young and black: What are their effects on juvenile justice decision making? Crime & Delinquency 54(4) (2008), 560–581.

24.

Lepri

Oliver

Letouzé

Pentland

and Vinck

, Fair, transparent, and accountable algorithmic decision-making processes, Philosophy & Technology 31(4) (2018), 611–627.

25.

Lohaus

Perrot

and von Luxburg

, Too relaxed to be fair, in: International Conference on Machine Learning, 2020.

26.

Maskin

, A theorem on utilitarianism, The Review of Economic Studies 45(1) (1978), 93–96.

27.

Mathioudakis

Castillo

Barnabo

and Celis

, Affirmative action policies for top-k candidates selection: with an application to the design of policies for university admissions, in: Proceedings of the 35th Annual ACM Symposium on Applied Computing, 2020, pp. 440–449.

28.

Mehrabi

Morstatter

Saxena

Lerman

and Galstyan

, A survey on bias and fairness in machine learning, arXiv preprint arXiv:1908.09635, 2019.

29.

Mehrabi

Morstatter

Saxena

Lerman

and Galstyan

, A survey on bias and fairness in machine learning, ACM Computing Surveys (CSUR) 54(6) (2021), 1–35.

30.

Mitchell

Potash

Barocas

D’Amour

and Lum

, Algorithmic fairness: Choices, assumptions, and definitions, Annual Review of Statistics and Its Application 8 (2021), 141–163.

31.

Mouzannar

Ohannessian

M.I.

and Srebro

, From fair decision making to social equality, in: Proceedings of the Conference on Fairness, Accountability, and Transparency, 2019, pp. 359–368.

32.

Noriega-Campero

Garcia-Bulle

Cantu

L.F.

Bakker

M.A.

Tejerina

and Pentland

, Algorithmic targeting of social policies: fairness, accuracy, and distributed governance, in: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 2020, pp. 241–251.

33.

Opricovic

and Tzeng

G.-H.

, Compromise solution by mcdm methods: A comparative analysis of vikor and topsis, European Journal of Operational Research 156(2) (2004), 445–455.

34.

Petrović

Nikolić

Radovanović

Delibašić

and Jovanović

, Fair: Fair adversarial instance re-weighting, Neurocomputing, 2022.

35.

Pleiss

Raghavan

Kleinberg

and Weinberger

K.Q.

, On fairness and calibration, arXiv preprint arXiv:1709.02012, 2017.

36.

Radovanović

Savić

Delibašić

and Suknović

, Fairdea-removing disparate impact from efficiency scores, European Journal of Operational Research, 2021.

37.

Raghavan

Barocas

Kleinberg

and Levy

, Mitigating bias in algorithmic hiring: Evaluating claims and practices, in: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 2020, pp. 469–481.

38.

Rose

D.L.

, Twenty-five years later: Where do we stand on equal employment opportunity law enforcement, Vand. L Rev. 42 (1989), 1121.

39.

Saaty

T.L.

, Decision making – the analytic hierarchy and network processes (ahp/anp), Journal of Systems Science and Systems Engineering 13(1) (2004), 1–35.

40.

Singh

and Joachims

, Fairness of exposure in rankings, in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, pp. 2219–2228.

41.

Singh

and Joachims

, Policy learning for fairness in ranking, Advances in Neural Information Processing Systems 32 (2019).

42.

Tibshirani

, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B (Methodological) 58(1) (1996), 267–288.

43.

Van der Put

C.E.

Assink

and van Solinge

N.F.B.

, Predicting child maltreatment: A meta-analysis of the predictive validity of risk assessment instruments, Child Abuse & Neglect 73 (2017), 71–88.

44.

Varian

H.R.

, Distributive justice, welfare economics, and the theory of fairness, Philosophy & Public Affairs, 1975, 223–247.

45.

Vavryčuk

, Fair ranking of researchers and research teams, PloS One 13(4) (2018), e0195509.

46.

Wang

Y.-M.

and Luo

, On rank reversal in decision analysis, Mathematical and Computer Modelling 49(5–6) (2009), 1221–1229.

47.

Wątróbski

Jankowski

Ziemba

Karczmarczyk

and Zioło

, Generalised framework for multi-criteria method selection, Omega 86 (2019), 107–124.

48.

Xiao

Sun

Zhang

and He

, A survey of human-in-the-loop for machine learning, Future Generation Computer Systems, 2022.

49.

Xiang

and Raji

I.D.

, On the legal compatibility of fairness definitions, arXiv preprint arXiv:1912.00761, 2019.

50.

Yang

Loftus

J.R.

and Stoyanovich

, Causal intersectionality for fair ranking, arXiv preprint arXiv:2006.08688, 2020.

51.

Yang

and Stoyanovich

, Measuring fairness in ranked outputs, in: Proceedings of the 29th International Conference on Scientific and Statistical Database Management, 2017, pp. 1–6.

52.

Zafar

M.B.

Valera

Gomez-Rodriguez

and Gummadi

K.P.

, Fairness constraints: A flexible approach for fair classification, J. Mach. Learn. Res. 20(75) (2019), 1–42.

53.

Zavadskas

E.K.

Turskis

and Kildiene

, State of art surveys of overviews on mcdm/madm methods, Technological and Economic Development of Economy 20(1) (2014), 165–179.

54.

Zehlike

Bonchi

Castillo

Hajian

Megahed

and Baeza-Yates

, Fa* ir: A fair top-k ranking algorithm, in: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017, pp. 1569–1578.

55.

Zehlike

and Castillo

, Reducing disparate exposure in ranking: A learning to rank approach, in: Proceedings of The Web Conference 2020, 2020, pp. 2849–2855.

56.

Zehlike

Hacker

and Wiedemann

, Matching code and law: achieving algorithmic fairness with optimal transport, Data Mining and Knowledge Discovery 34(1) (2020), 163–200.

57.

Zehlike

Yang

and Stoyanovich

, Fairness in ranking: A survey, arXiv preprint arXiv:2103.14000, 2021.

58.

Zemel

Swersky

Pitassi

and Dwork

, Learning fair representations, in: International Conference on Machine Learning, PMLR, 2013, pp. 325–333.

FairAW – Additive weighting without discrimination

Abstract

Keywords

1. Introduction

2. Related work

2.1 Notions of fairness

3.1 Motivation for the FairAW

3.2 FairAW

3.4 Experimental setup

Table 3 Student Placement Results ( n = 215 ) – utility change

Footnotes

Acknowledgments

Appendix A. Quadratic model

Appendix B. Dual model

References

Table 3
Student Placement Results ( $n=215$ ) – utility change