On the Upward Bias of the Dissimilarity Index and Its Corrections

Abstract

The dissimilarity index of Duncan and Duncan is widely used in a broad range of contexts to assess the overall extent of segregation in the allocation of two groups in two or more units. Its sensitivity to random allocation implies an upward bias with respect to the unknown amount of systematic segregation. In this article, following a multinomial framework based on the assumption that individuals allocate themselves independently and that unit sizes are not fixed, we provide (1) a mathematical proof of the nonnegativity of the bias, (2) an analytic way of obtaining the same results of a recent bootstrap-based bias correction but without using resampling, and (3) a new bias correction that outperforms, in terms of both bias and mean square error, those based on grouped jackknife, bootstrap, and double bootstrap.

Keywords

bias correction bootstrap dissimilarity index random allocation segregation

Introduction

The study of segregation of demographic groups, often related to ethnicity or gender, is a major topic of research for economists, demographers, and other social scientists. In almost all applications, the assessment of the amount of segregation within a community is based on the proportions of demographic groups afferent to some kind of allocation units, such as residential areas, workplaces, or schools.

Many segregation indexes have been suggested, with different formulations referring to different definitions of segregation (for an overview, see Massey and Denton 1988; White 1986). Among these, the dissimilarity index D, proposed by Duncan and Duncan (1955), is widely used to assess the differential distribution of two groups among allocation units. This index has been used in a broad range of contexts, such as gender segregation (see, e.g., Deutsch and Silber 2005; Kakwani 1994; Karmel and Maclachlan 1988), labor force segregation (for a survey, see Flückiger and Silber 1999; also see Silber 1989, where D is compared to alternative indexes of segregation, Carrington and Troske 1997, and Silber, Flückiger, and Reardon 2009), and residential segregation (see Duncan and Duncan 1955; Farley 1975; Taeuber and Taeuber 1965, who also faces school segregation, and Massey and Denton 1987, 1988).

The observed allocation pattern is one of the possible outcomes of a random process that is characterized by a certain level of “systematic” segregation, the resulting of a mix of behavior-based forces. In this view, the observed dissimilarity $\hat{D}$ is merely the natural estimator of the true but unknown level of D, and the randomness of the allocation process holds even if the index is computed on full-count census data.

Within a multinomial framework based on the assumption that individuals allocate themselves independently and that unit sizes are not fixed (see the second section), Allen, Burgess, and Windmeijer (2009) demonstrate, using simulations, that random allocation generates substantial unevenness, and hence an upward bias, especially when dealing with small units, a small minority proportion, and a low level of segregation. Other authors, using different frameworks, arrive at the same conclusions (see, e.g., Carrington and Troske 1997; Cortese, Falk, and Cohen 1976; Farley and Johnson 1985; Ransom 2000). In the third section, we first outline, using simulations, the effect of some relevant factors to the bias magnitude and then we provide an analytical proof that $\hat{D}$ is an upward biased estimator of D which confirms the simulation results of Allen et al. (2009). The same authors propose a bootstrap-based estimator that partially reduces the bias; in the fourth section, we demonstrate that their results can be obtained without using resampling, and in the fifth section, we introduce a new bias-corrected estimator that we show to outperform their proposal, as well as some other resampling-based bias corrections, in terms of both bias and mean square errors (MSEs).

Inferential Framework and Notation

Consider an area subdivided into k units, denoted by j = 1, … , k, being populated by n individuals belonging to two groups according to a dichotomous characteristic c = 0, 1. Examples of common dichotomous characteristics are black or white ethnicity, male or female gender, and so on. The number of individuals with status c is denoted by n^c , c = 0, 1, with n = n ⁰ + n ¹.

The observed allocation—characterized by the two sets denoted by ${\hat{n}}_{1}^{0}, \dots, {\hat{n}}_{k}^{0}$ for status 0 and ${\hat{n}}_{1}^{1}, \dots, {\hat{n}}_{k}^{1}$ for status 1—is, however, only one of the possible realizations of an underlying allocation process $P$ . If it is plausible to assume that the two groups allocate themselves independently and that unit sizes are not fixed, then $P$ will be governed by the conditional probabilities:

p_{j}^{c} = P (u n i t o f m e m b e r s h i p = j | c), j = 1, \dots, k a n d c = 0, 1

that an individual i will belong to the unit j, given his or her status c.

Social scientists are usually interested in making inferences on a particular function of these probabilities; this function, commonly called segregation index, should express the degree of segregation that characterizes $P$ , the so-called systematic segregation. The latter occurs when there is at least one unit in which individuals belonging to the two groups have a different probability to allocate themselves; in mathematical terms, this means that:

\exists j : p_{j}^{1} \neq p_{j}^{0} .

Among the many segregation indexes existing in literature (see Duncan and Duncan 1955; Hutchens 1991; James and Taeuber 1985; Jerby, Semyonov, and Lewin-Epstein 2005; Massey and Denton 1988; Massey, White, and Phua 1996; White 1986), the most popular one is without doubt the dissimilarity index (Duncan and Duncan 1955):

D = \frac{1}{2} \sum_{j = 1}^{k} |p_{j}^{1} - p_{j}^{0}| .

Obviously, the index in equation (2) takes values [0, 1], and it increases as systematic segregation grows. Furthermore, it is straightforward to note that the case D = 0 (absence of systematic segregation) is achievable if and only if

p_{j}^{1} = p_{j}^{0} \forall j .

Unfortunately, we can only observe the crude counterpart of D

\hat{D} = \frac{1}{2} \sum_{j = 1}^{k} |\frac{{\hat{n}}_{j}^{1}}{n^{1}} - \frac{{\hat{n}}_{j}^{0}}{n^{0}}| = \frac{1}{2} \sum_{j = 1}^{k} |{\hat{p}}_{j}^{1} - {\hat{p}}_{j}^{0}|,

where ${\hat{p}}_{j}^{c}$ is the plug-in estimator of $p_{j}^{c}$ . The word “unfortunately” is justified if one thinks that the observed allocation is only one of the numerous possible patterns arising from $P$ , each of them with probability given by the product of two independent multinomial distributions, one for c = 0 and other for c = 1:

P (n_{1}^{0}, \dots, n_{k}^{0}, n_{1}^{1}, \dots, n_{k}^{1}; p_{1}^{0}, \dots, p_{k}^{0}, p_{1}^{1}, \dots, p_{k}^{1}, n^{0}, n^{1}) = \prod_{j = 1}^{k} \prod_{c = 0}^{1} n^{c}! \frac{{(p_{j}^{c})}^{n_{j}^{c}}}{n_{j}^{c}!} .

As mentioned already, this framework, introduced by Allen et al. (2009), assumes that individuals allocate themselves independently and that the quantities n^c , c = 0, 1, are fixed after sampling. The latter assumption distinguishes this framework from Ransom (2000) who assumes that only the overall size n is fixed after sampling. As an example, suppose to have an area subdivided into k = 50 units populated by n ⁰ = 100 black and n ¹ = 100 white individuals (resulting in an overall size of n = 1,000). The underlying allocation process of the framework of Ransom (2000) only guarantees that, at the end of the allocation, the area will be populated by 1,000 individuals. On the contrary, the underlying allocation process of the more restrictive, although more realistic, framework of Allen et al. (2009) assures that, at the end of the allocation, the considered area will be populated by 100 blacks and 900 whites. Note that more restrictive frameworks also exist where, for instance, unit sizes n_j are assumed fixed in the allocation process too (Carrington and Troske 1997).

Conceptually, according to the framework of Allen et al. (2009), it is straightforward to calculate the exact sampling distribution of $\hat{D}$ by enumerating all of the possible combinations of $n_{1}^{0}, \dots, n_{k}^{0}$ and $n_{1}^{1}, \dots, n_{k}^{1}$ , evaluating the index values that correspond, and calculating the probability of this occurring using the probability model in equation (4).

Bias of $\hat{D}$

As $\hat{D}$ is an estimator for D, it is possible to define its bias as

B i a s (\hat{D}) = E (\hat{D}) - D .

The expectation in equation (5), taken over the independent multinomial distributions with probabilities $p_{j}^{c}$ , j = 1,…, k, and c = 0, 1, can be explicited as follows:

E (\hat{D}) = \frac{1}{2} \sum_{(n_{1}^{0}, \dots, n_{k}^{0}) : n^{0}} \sum_{(n_{1}^{1}, \dots, n_{k}^{1}) : n^{1}} [(\sum_{j = 1}^{k} |\frac{n_{j}^{1}}{n^{1}} - \frac{n_{j}^{0}}{n^{0}}|) \prod_{j = 1}^{k} \prod_{c = 0}^{1} n^{c}! \frac{{(p_{j}^{c})}^{n_{j}^{c}}}{n_{j}^{c}!}],

where the first two summations run across all possible patterns $n_{1}^{c}, \dots, n_{k}^{c}$ , satisfying the constraint $\sum_{j} n_{j}^{c} = n^{c}$ , c = 0, 1.

Determining Factors

As it has been shown through simulations (see Allen et al. 2009; Carrington and Troske 1997), $B i a s (\hat{D})$ tends to be positive, that is, $\hat{D}$ tends to overestimate systematic segregation D. The bias results from the fact that the index is based on absolute values and this causes a higher bias when D is lower. If, for instance, systematic segregation was 0, any sampling variation that causes $p_{j}^{1}$ to differ from $p_{j}^{0}$ would result in a (upward) bias increase. Furthermore, the size of the bias depends on the variability of the differences $p_{j}^{1} - p_{j}^{0}$ which, within our setup, declines if the sizes of the two groups increase and/or the minority group is larger.

Plots in Figure 1 show the behavior of $B i a s (\hat{D})$ as a function of three factors: the minority group proportion p, the expected unit size E(n_j ), supposed to be equal across all units, and the value of $D$ . The setup of the simulations is similar to the one adopted by Allen et al. (2009). The sets of conditional probabilities $p_{1}^{0}, \dots, p_{k}^{0}$ and $p_{1}^{1}, \dots, p_{k}^{1}$ , with k = 50, were obtained with the formula

Figure 1.

Behavior of $B i a s (\hat{D})$ as a function of p, E(n_j ), and D. Results are obtained by Monte Carlo simulations using the parabolic segregation curves of Duncan and Duncan (1955) to generate the conditional probabilities $p_{j}^{c}$ , j = 1, …, 50, and c = 0, 1.

P (u n i t \leq j | c = 1) = \frac{(1 - q) P (u n i t \leq j | c = 0)}{1 - q P (u n i t \leq j | c = 0)}

proposed in Duncan and Duncan (1955); it may be observed that each value of q is related to one value of D. Although this set of segregation curves cannot represent all distributions of segregation, it is a sufficient set to examine different levels of systematic segregation for the purposes of this article. Equation (7), combined with the constraint of equal expected unit sizes, fixes the conditional allocation probabilities for both groups. An allocation is then generated assigning n ¹ and n ⁰ individuals to the k units by sampling from two multinomial distributions having each one of the two sets of conditional probabilities as parameter.

Nonnegativity

In the following, we will prove that the nonnegativity of equation (5) can be easily shown reasoning unit by unit.

Lemma 1: $B i a s (\hat{D})$ assumes only nonnegative values.

Proof. The allocation mechanism $P$ , in each unit j, j = 1, … k, is governed by a binomial distribution with parameters n^c and $p_{j}^{c}$ , denoted with $B i n (n^{c}, p_{j}^{c});$ this is the jth marginal distribution of the multinomial adopted in equation (4); see Johnson, Kotz, and Balakrishnan (1997).

Now, let ${\hat{d}}_{j} = {\hat{p}}_{j}^{1} - {\hat{p}}_{j}^{0}$ be the plug-in estimator of the true but unknown difference d_j . Since we know that ${\hat{n}}_{j}^{c} ~ B i n (n^{c}, p_{j}^{c})$ , it is easy to show that ${\hat{d}}_{j}$ is an unbiased estimator of d_j ; in fact, we have:

E ({\hat{d}}_{j}) = \frac{E ({\hat{n}}_{j}^{1})}{n^{1}} - \frac{E ({\hat{n}}_{j}^{0})}{n^{0}} = \frac{n^{1} p_{j}^{1}}{n^{1}} - \frac{n^{0} p_{j}^{0}}{n^{0}} = d_{j} .

Thus, using the result in equation (8), equation (5) can be expressed as follows:

\begin{aligned} B i a s (\hat{D}) = \frac{1}{2} \sum_{j = 1}^{k} E (|{\hat{d}}_{j}|) - \frac{1}{2} \sum_{j = 1}^{k} |d_{j}| \\ = \frac{1}{2} \sum_{j = 1}^{k} E (|{\hat{d}}_{j}|) - \frac{1}{2} \sum_{j = 1}^{k} |E ({\hat{d}}_{j})| \\ = \frac{1}{2} \sum_{j = 1}^{k} [E (|{\hat{d}}_{j}|) - |E ({\hat{d}}_{j})|] \end{aligned}

The quantity in square brackets can be considered as the contribution to the bias given by the jth unit. Being the absolute value, a convex function, from the well-known Jensen’s inequality, we have that

E (|{\hat{d}}_{j}|) \geq |E ({\hat{d}}_{j})| .

Thus, each term of the summation in equation (9) is nonnegative and the thesis is confirmed.

A Bootstrap Bias Correction

With the aim to eliminate, or at least reduce, the upward bias of $\hat{D}$ , Allen et al. (2009) adopt a bootstrap-based bias correction. It is based on the idea that

D - {\hat{D}}_{o b s} \approx {\hat{D}}_{o b s} - E (\hat{D} | {\hat{p}}_{1}^{0}, \dots, {\hat{p}}_{k}^{0}, {\hat{p}}_{1}^{1}, \dots, {\hat{p}}_{k}^{1}, n^{0}, n^{1}),

where ${\hat{D}}_{o b s}$ denotes the observed counterpart of $\hat{D}$ . The observed conditional probabilities ${\hat{p}}_{j}^{0}$ and ${\hat{p}}_{j}^{1}$ , j = 1, … k, are used to generate, by multinomial sampling, B bootstrap allocations with the same group sizes n ⁰ and n ¹. By denoting with D_b the value of the dissimilarity index on the bth bootstrap sample, b = 1, … B, we can compute their average as

{\overline{D}}_{B o o t} = \frac{1}{B} \sum_{b = 1}^{B} D_{b} .

This quantity substitutes the expectation in equation (11). Then, a measure of $B i a s (\hat{D})$ is given by ${\overline{D}}_{B o o t} - \hat{D}$ , and the bootstrap bias corrected estimate of D can be obtained as

{\hat{D}}_{B o o t} = {\hat{D}}_{o b s} - ({\overline{D}}_{B o o t} - {\hat{D}}_{o b s}) = 2 {\hat{D}}_{o b s} - {\overline{D}}_{B o o t} .

This type of bias correction would work well if the bias were constant for different values of D. This is not the case here, as it is clearly shown in Figure 1c and f. This bias correction is therefore not expected to “eliminate”, but only to “reduce”, the existing bias.

Equivalent Analytical Formulations

Instead of bootstrapping $E (\hat{D} | {\hat{p}}_{1}^{0}, \dots, {\hat{p}}_{k}^{0}, {\hat{p}}_{1}^{1}, \dots, {\hat{p}}_{k}^{1}, n^{0}, n^{1})$ , we might compute it analytically using the formula in (6); but, as Allen et al. (2009) note, this is conceivable only for a small number of units with small sizes. In the following, we introduce an alternative formulation based on the unit-by-unit approach adopted in the proof of Lemma 1.

Recall that, considering the generic jth unit,

{\hat{d}}_{j} = \frac{{\hat{n}}_{j}^{1}}{n^{1}} - \frac{{\hat{n}}_{j}^{0}}{n^{0}},

with ${\hat{n}}_{j}^{c} ~ B i n (n^{c}, p_{j}^{c})$ , c = 0, 1. Then we can write

\begin{aligned} E_{B i n B i n} (|{\hat{d}}_{j}|) = \sum_{u = 0}^{n^{0}} \sum_{v = 0}^{n^{1}} |\frac{v}{n^{1}} - \frac{u}{n^{0}}| \\ (\begin{matrix} n^{0} \\ u \end{matrix}) {(p_{j}^{0})}^{u} {(1 - p_{j}^{0})}^{n^{0} - u} (\begin{matrix} n^{1} \\ v \end{matrix}) {(p_{j}^{1})}^{v} {(1 - p_{j}^{1})}^{n^{1} - v} . \end{aligned}

Thus, by using the rationale of equation (12), we can define the estimator

{\hat{D}}_{B i n B i n} = 2 \hat{D} - \frac{1}{2} \sum_{j = 1}^{k} E_{B i n B i n} (|{\hat{d}}_{j}| | {\hat{p}}_{j}^{0}, {\hat{p}}_{j}^{1}) .

Computation required by equation (14) is easier than equation (6), but it may still be demanding when group size n^c , c = 0, 1, is large. To handle these cases, we propose the following approach.

Lemma 2: If n^c , c = 0, 1, is sufficiently large, then $|{\hat{d}}_{j}|$ has a folded normal distribution.

Proof. As previously said, ${\hat{n}}_{j}^{c} ~ B i n (n^{c}, p_{j}^{c})$ , c = 0, 1. If n^c , c = 0, 1, is sufficiently large, for the well-known normal approximation to the binomial distribution (based on the de Moivre–Laplace theorem; see Johnson, Kemp, and Kotz 2005:116), then ${\hat{n}}_{j}^{c}$ has a normal distribution with mean $n^{c} p_{j}^{c}$ and variance $n^{c} p_{j}^{c} (1 - p_{j}^{c})$ . Consequently, the scalar-transformed variable ${\hat{n}}_{j}^{c} / n^{c}$ , c = 0, 1, has again a normal distribution with mean $p_{j}^{c}$ and variance $p_{j}^{c} (1 - p_{j}^{c}) / n^{c}$ , while the difference variable ${\hat{d}}_{j}$ is also normally distributed with mean $p_{j}^{1} - p_{j}^{0}$ and variance $p_{j}^{1} (1 - p_{j}^{1}) / n^{1} + p_{j}^{0} (1 - p_{j}^{0}) / n^{0}$ . So, the variable $|{\hat{d}}_{j}|$ has a folded normal distribution (Leone, Nelson, and Nottingham 1961) with parameters

μ_{j} = p_{j}^{1} - p_{j}^{0} a n d σ_{j}^{2} = \frac{p_{j}^{1} (1 - p_{j}^{1})}{n^{1}} + \frac{p_{j}^{0} (1 - p_{j}^{0})}{n^{0}},

whose mean results

E_{F o l d e d} (|{\hat{d}}_{j}|) = σ_{j} \sqrt{\frac{2}{π}} exp (- \frac{μ_{j}^{2}}{2 σ_{j}^{2}}) + μ_{j} [1 - 2 Φ (- \frac{μ_{j}}{σ_{j}})],

where $Φ (\cdot)$ denotes the cumulative distribution function of a standard normal distribution.

According to the rationale of equations (12) and (14) and based on equation (15), we can define the estimator

{\hat{D}}_{F o l d e d} = 2 \hat{D} - \frac{1}{2} \sum_{j = 1}^{k} E_{F o l d e d} (|{\hat{d}}_{j}| | {\hat{μ}}_{j}, {\hat{σ}}_{j}),

to be used, alternatively to

{\hat{D}}_{B i n B i n}

, when n^c , c = 0, 1, is sufficiently large.

Comparison

In this section, we use Monte Carlo simulations to compare the bias of the three estimators ${\hat{D}}_{B o o t}$ , ${\hat{D}}_{B i n B i n}$ , and ${\hat{D}}_{F o l d e d}$ and to evaluate the speed of convergence of ${\hat{D}}_{F o l d e d}$ to ${\hat{D}}_{B i n B i n}$ .

Following the setup used for simulations in Determining Factors subsection, the factors considered are as follows: p, E(n_j ), and D. For each of them, a grid of values is chosen as follows: 0.01, 0.05, 0.10, 0.30, and 0.50 for p, 6, 10, 20, 30, 40, and 50 for E(n_j ), and 0, 0.056, 0.127, 0.225, 0.292, 0.382, 0.634, and 0.818 for D. The values chosen for D are, respectively, related to the values 0, 0.2, 0.4, 0.6, 0.7, 0.8, 0.95, and 0.99, of the parameter q in equation (7). The number of units is fixed at k = 50, their sizes n_j are equal in expectation and, for ${\hat{D}}_{B o o t}$ , the number of bootstrap replications is fixed to B = 100. For each combination of the considered simulation factors, 1,000 samples are randomly generated.

The mean simulated biases of $\hat{D}$ , ${\hat{D}}_{B o o t}$ , ${\hat{D}}_{B i n B i n}$ , and ${\hat{D}}_{F o l d e d}$ , for the five values of p considered, are reported in Tables 1 to 5. With regard to the behavior of ${\hat{D}}_{B o o t}$ , by a joint analysis of these results (see also the results in Allen et al. 2009), we can note that:

Table 5.

Mean Bias, Over 1000 Replications, for $\hat{D}$ , ${\hat{D}}_{B o o t}$ , ${\hat{D}}_{B i n B i n}$ , and ${\hat{D}}_{F o l d e d}$ at the Varying of E(n_j ) and D, Fixed k = 50 and p = .5.

		D
E(n_j )	Estimator	0.000	0.056	0.127	0.225	0.292	0.382	0.634	0.817
6	$\hat{D}$	0.317	0.263	0.209	0.150	0.122	0.094	0.042	0.021
	${\hat{D}}_{B o o t}$	0.194	0.141	0.093	0.047	0.029	0.017	0.000	0.001
	${\hat{D}}_{B i n B i n}$	0.194	0.141	0.093	0.047	0.029	0.017	0.000	0.001
	${\hat{D}}_{F o l d e d}$	0.186	0.133	0.086	0.038	0.023	0.010	−0.010	−0.015
10	$\hat{D}$	0.247	0.196	0.145	0.099	0.078	0.058	0.024	0.012
	${\hat{D}}_{B o o t}$	0.148	0.098	0.054	0.023	0.012	0.005	−0.005	−0.001
	${\hat{D}}_{B i n B i n}$	0.148	0.098	0.054	0.022	0.012	0.005	−0.004	−0.001
	${\hat{D}}_{F o l d e d}$	0.144	0.094	0.051	0.019	0.009	0.002	−0.008	−0.004
20	$\hat{D}$	0.175	0.126	0.084	0.051	0.040	0.030	0.012	0.005
	${\hat{D}}_{B o o t}$	0.103	0.056	0.022	0.004	0.002	0.001	−0.001	−0.001
	${\hat{D}}_{B i n B i n}$	0.103	0.056	0.022	0.004	0.002	0.001	−0.001	−0.001
	${\hat{D}}_{F o l d e d}$	0.102	0.055	0.021	0.003	0.001	0.000	−0.002	−0.001
30	$\hat{D}$	0.144	0.097	0.060	0.033	0.026	0.018	0.008	0.004
	${\hat{D}}_{B o o t}$	0.086	0.041	0.013	0.000	0.000	−0.001	−0.001	0.000
	${\hat{D}}_{B i n B i n}$	0.086	0.041	0.013	0.000	0.000	−0.001	−0.001	0.000
	${\hat{D}}_{F o l d e d}$	0.085	0.040	0.012	−0.001	−0.001	−0.001	−0.001	0.000
40	$\hat{D}$	0.124	0.079	0.046	0.025	0.020	0.014	0.007	0.002
	${\hat{D}}_{B o o t}$	0.073	0.031	0.007	−0.001	0.000	0.000	0.000	0.000
	${\hat{D}}_{B i n B i n}$	0.073	0.031	0.007	−0.001	0.000	0.000	0.000	0.000
	${\hat{D}}_{F o l d e d}$	0.073	0.031	0.007	−0.001	0.000	−0.001	0.000	0.000
50	$\hat{D}$	0.111	0.067	0.037	0.021	0.016	0.011	0.004	0.003
	${\hat{D}}_{B o o t}$	0.065	0.025	0.004	0.000	0.000	0.000	−0.001	0.001
	${\hat{D}}_{B i n B i n}$	0.065	0.025	0.004	0.000	0.000	0.000	−0.001	0.001
	${\hat{D}}_{F o l d e d}$	0.065	0.025	0.004	0.000	−0.001	0.000	−0.001	0.001

Note: For ${\hat{D}}_{B o o t}$ , B = 100 bootstrap replications are considered.

Table 1.

Mean Bias, Over 1,000 Replications, for $\hat{D}$ , ${\hat{D}}_{B o o t}$ , ${\hat{D}}_{B i n B i n}$ , and ${\hat{D}}_{F o l d e d}$ at the Varying of E(n_j ) and D, Fixed k = 50 and p = .01.

		D
E(n_j )	Estimator	0.000	0.056	0.127	0.225	0.292	0.382	0.634	0.817
6	$\hat{D}$	0.942	0.885	0.815	0.717	0.649	0.560	0.312	0.145
	${\hat{D}}_{B o o t}$	0.925	0.868	0.798	0.700	0.632	0.543	0.298	0.138
	${\hat{D}}_{B i n B i n}$	0.925	0.868	0.798	0.700	0.632	0.543	0.298	0.138
	${\hat{D}}_{F o l d e d}$	0.892	0.835	0.765	0.668	0.599	0.511	0.271	0.124
10	$\hat{D}$	0.904	0.848	0.778	0.679	0.614	0.525	0.287	0.134
	${\hat{D}}_{B o o t}$	0.874	0.817	0.748	0.648	0.584	0.496	0.266	0.126
	${\hat{D}}_{B i n B i n}$	0.874	0.817	0.748	0.648	0.584	0.496	0.266	0.126
	${\hat{D}}_{F o l d e d}$	0.835	0.778	0.709	0.610	0.547	0.460	0.239	0.114
20	$\hat{D}$	0.818	0.761	0.690	0.596	0.533	0.450	0.239	0.112
	${\hat{D}}_{B o o t}$	0.759	0.702	0.631	0.539	0.478	0.398	0.207	0.098
	${\hat{D}}_{B i n B i n}$	0.759	0.702	0.631	0.539	0.478	0.398	0.207	0.098
	${\hat{D}}_{F o l d e d}$	0.727	0.670	0.599	0.508	0.448	0.369	0.188	0.090
30	$\hat{D}$	0.738	0.685	0.614	0.523	0.461	0.386	0.204	0.096
	${\hat{D}}_{B o o t}$	0.654	0.602	0.532	0.445	0.385	0.317	0.165	0.078
	${\hat{D}}_{B i n B i n}$	0.654	0.602	0.532	0.444	0.385	0.317	0.165	0.078
	${\hat{D}}_{F o l d e d}$	0.631	0.579	0.509	0.422	0.363	0.297	0.153	0.072
40	$\hat{D}$	0.668	0.612	0.546	0.457	0.400	0.333	0.174	0.083
	${\hat{D}}_{B o o t}$	0.563	0.507	0.445	0.361	0.309	0.251	0.129	0.062
	${\hat{D}}_{B i n B i n}$	0.563	0.507	0.445	0.361	0.309	0.251	0.129	0.062
	${\hat{D}}_{F o l d e d}$	0.547	0.491	0.429	0.345	0.293	0.237	0.121	0.059
50	$\hat{D}$	0.602	0.549	0.485	0.399	0.348	0.285	0.148	0.070
	${\hat{D}}_{B o o t}$	0.480	0.428	0.368	0.289	0.246	0.194	0.099	0.047
	${\hat{D}}_{B i n B i n}$	0.480	0.428	0.368	0.289	0.246	0.194	0.099	0.047
	${\hat{D}}_{F o l d e d}$	0.467	0.416	0.356	0.276	0.234	0.183	0.092	0.044

Note: For ${\hat{D}}_{B o o t}$ , B = 100 bootstrap replications are considered.

when p, E(n_j ), and D present low values, the observed segregation $\hat{D}$ incorrectly suggests that a highly segregating process underlies the allocation, and ${\hat{D}}_{B o o t}$ does little to correct this bias;

in the opposite situation of high values for p, E(n_j ), and D, no correction is needed because $E (\hat{D})$ is not different from the true value D;

for moderate values of D (e.g., from 0.1 to 0.4), provided that p and E(n_j ) are not both simultaneously very small, ${\hat{D}}_{B o o t}$ works well enough.

Comparing ${\hat{D}}_{B i n B i n}$ with ${\hat{D}}_{B o o t}$ , we can see that 100 bootstrap replications are enough for obtaining, in all the considered scenarios, practically the same results. With regard to the convergence of ${\hat{D}}_{F o l d e d}$ to ${\hat{D}}_{B i n B i n}$ , it improves as simulation factors increase. However, too small values of p and E(n_j ) seem to be critical in the attainment of this convergence (see Tables 1 and 2). Furthermore, it is interesting to note as the error in convergence, included in ${\hat{D}}_{F o l d e d}$ , tends to improve the performance of ${\hat{D}}_{B o o t}$ . Finally, note that the negative values in Tables 2 to 5 are due to the fact that when $B i a s (\hat{D})$ is close to zero, the mean bias, over Monte Carlo simulations, fluctuates around zero.

Table 2.

Mean Bias, Over 1,000 Replications, for $\hat{D}$ , ${\hat{D}}_{B o o t}$ , ${\hat{D}}_{B i n B i n}$ , and ${\hat{D}}_{F o l d e d}$ at the Varying of E(n_j ) and D, Fixed k = 50 and p = .05.

		D
E(n_j )	Estimator	0.000	0.056	0.127	0.225	0.292	0.382	0.634	0.817
6	$\hat{D}$	0.739	0.684	0.613	0.522	0.461	0.386	0.203	0.096
	${\hat{D}}_{B o o t}$	0.654	0.600	0.529	0.442	0.385	0.318	0.163	0.077
	${\hat{D}}_{B i n B i n}$	0.655	0.600	0.529	0.442	0.385	0.318	0.163	0.077
	${\hat{D}}_{F o l d e d}$	0.628	0.572	0.503	0.416	0.360	0.294	0.148	0.069
10	$\hat{D}$	0.604	0.547	0.482	0.399	0.350	0.287	0.148	0.071
	${\hat{D}}_{B o o t}$	0.482	0.425	0.363	0.288	0.247	0.197	0.098	0.047
	${\hat{D}}_{B i n B i n}$	0.482	0.425	0.363	0.288	0.247	0.197	0.099	0.047
	${\hat{D}}_{F o l d e d}$	0.465	0.408	0.346	0.272	0.231	0.182	0.090	0.043
20	$\hat{D}$	0.396	0.346	0.285	0.220	0.180	0.141	0.064	0.029
	${\hat{D}}_{B o o t}$	0.241	0.194	0.136	0.085	0.056	0.034	0.002	−0.001
	${\hat{D}}_{B i n B i n}$	0.241	0.194	0.136	0.085	0.056	0.034	0.002	−0.002
	${\hat{D}}_{F o l d e d}$	0.230	0.183	0.125	0.075	0.046	0.025	−0.002	−0.003
30	$\hat{D}$	0.334	0.283	0.226	0.161	0.134	0.099	0.045	0.018
	${\hat{D}}_{B o o t}$	0.209	0.159	0.107	0.052	0.036	0.015	−0.002	−0.005
	${\hat{D}}_{B i n B i n}$	0.209	0.159	0.107	0.052	0.036	0.015	−0.002	−0.005
	${\hat{D}}_{F o l d e d}$	0.206	0.156	0.104	0.049	0.034	0.013	−0.002	−0.005
40	$\hat{D}$	0.286	0.234	0.180	0.126	0.101	0.074	0.031	0.014
	${\hat{D}}_{B o o t}$	0.172	0.121	0.073	0.032	0.019	0.005	−0.007	−0.004
	${\hat{D}}_{B i n B i n}$	0.172	0.121	0.073	0.032	0.019	0.005	−0.007	−0.004
	${\hat{D}}_{F o l d e d}$	0.170	0.120	0.072	0.031	0.018	0.005	−0.006	−0.003
50	$\hat{D}$	0.258	0.205	0.155	0.105	0.080	0.058	0.026	0.012
	${\hat{D}}_{B o o t}$	0.156	0.105	0.061	0.024	0.009	0.001	−0.003	−0.001
	${\hat{D}}_{B i n B i n}$	0.156	0.105	0.061	0.024	0.009	0.001	−0.003	−0.001
	${\hat{D}}_{F o l d e d}$	0.156	0.104	0.060	0.024	0.009	0.001	−0.002	−0.001

Note: For ${\hat{D}}_{B o o t}$ , B = 100 bootstrap replications are considered.

A New Bias Correction

In this section, we introduce a new estimator of D, which further reduces the bias with respect to ${\hat{D}}_{B o o t}$ . The rationale of our approach consists in choosing, as estimator of D, a value $\tilde{D}$ that makes $E (\hat{D} | {\tilde{p}}_{1}^{0}, \dots, {\tilde{p}}_{k}^{0}, {\tilde{p}}_{1}^{1}, \dots, {\tilde{p}}_{k}^{1}, n^{0}, n^{1})$ as close as possible to ${\hat{D}}_{o b s}$ , with

\tilde{D} = \frac{1}{2} \sum_{j = 1}^{k} |{\tilde{p}}_{j}^{1} - {\tilde{p}}_{j}^{0}| .

There may be different criteria for choosing $\tilde{D}$ . However, according to the proof of Lemma 1, the contribution to the bias of each unit is nonnegative; hence, we have chosen to require the sequence of differences $|{\tilde{p}}_{j}^{0} - {\tilde{p}}_{j}^{1}|$ to be a flattened variant of its observed counterpart. For the sake of identifiability, flattening is obtained by spreading the difference $Δ = {\hat{D}}_{o b s} - \tilde{D} \geq 0$ , among the k differences $|{\tilde{p}}_{j}^{0} - {\tilde{p}}_{j}^{1}|$ , proportionally to the residuals ${\hat{d}}_{j} = |{\hat{p}}_{j}^{0} - {\hat{p}}_{j}^{1}|$ .

Operationally, an optimization procedure has been implemented in the R computing environment (R Core Team 2013)—available from the authors upon request—that can be summarized as follows. Let

{\hat{δ}}_{j} = \frac{|{\hat{d}}_{j}|}{\sum_{j = 1}^{k} |{\hat{d}}_{j}|},

be the relative absolute difference in unit j, j = 1, … , k. For each unit j, we also define the modified probabilities of our estimator as

\begin{aligned} {\tilde{p}}_{j}^{0} = {\hat{p}}_{j}^{0} + s i g n ({\hat{d}}_{j}) Δ {\hat{δ}}_{j} \\ {\tilde{p}}_{j}^{1} = {\hat{p}}_{j}^{1} - s i g n ({\hat{d}}_{j}) Δ {\hat{δ}}_{j}, \end{aligned}

where $s i g n (\cdot)$ denotes the sign function. The value of Δ is obtained by minimizing the objective function

f_{o b j} (Δ) = |{\hat{D}}_{o b s} - E (\tilde{D} | {\tilde{p}}_{1}^{0}, \dots, {\tilde{p}}_{k}^{0}, {\tilde{p}}_{1}^{1}, \dots, {\tilde{p}}_{k}^{1}, n^{0}, n^{1})|,

in the range $[0, {\hat{D}}_{o b s}]$ where, analogously to the fourth section, the expectation can be computed following the bootstrap, the “BinBin,” or the “Folded” approach; the resulting value of $\tilde{D} = {\hat{D}}_{o b s} - Δ$ is accordingly denoted by ${\tilde{D}}_{B o o t}$ , ${\tilde{D}}_{B i n B i n}$ , and ${\tilde{D}}_{F o l d e d}$ . The optimize() function, of the stats package for R, is used for the constrained minimization; it adopts a combination of golden section search and successive parabolic interpolation. Convergence is never much slower than that for a Fibonacci search (Brent 1973).

Performance Evaluation

To evaluate the performance of our estimator, we have performed a simulation study having the same design described in the Comparison subsection, but with the addition of the values 100 and 200 for the simulation factor E(n_j ). For the sake of brevity, and without loss of generality, only the bootstrap variants ${\hat{D}}_{B o o t}$ and ${\tilde{D}}_{B o o t}$ are reported here. The overall results can be obtained from the authors upon request.

The mean simulated biases of $\hat{D}$ , ${\hat{D}}_{B o o t}$ , and ${\tilde{D}}_{B o o t}$ , for the values 0.01, 0.05, 0.1, 0.3, and 0.5 of p, are reported in Tables 6 to 10. From these results, we can note as ${\tilde{D}}_{B o o t}$ most often outperforms ${\hat{D}}_{B o o t}$ in reducing the bias. The plots from Figures 2 to 4 allow for graphical inspection of some of the obtained results. These plots further emphasize the very good performance of our estimator.

Figure 4.

Comparison between $B i a s (\hat{D})$ , $B i a s ({\hat{D}}_{B o o t})$ , and $B i a s ({\tilde{D}}_{B o o t})$ , at the varying of p, fixed E(n_j ) = 6 and D = 0. For ${\hat{D}}_{B o o t}$ and ${\tilde{D}}_{B o o t}$ , B = 100 bootstrap replications are considered.

Figure 2.

Comparison between $B i a s (\hat{D})$ , $B i a s ({\hat{D}}_{B o o t})$ , and $B i a s ({\tilde{D}}_{B o o t})$ , at the varying of D, fixed p = 0.38 and E(n_j ) = 6. For ${\hat{D}}_{B o o t}$ and ${\tilde{D}}_{B o o t}$ , B = 100 bootstrap replications are considered.

Table 10.

Mean Bias, Over 1,000 Replications, for $\hat{D}$ , ${\hat{D}}_{B o o t}$ , and ${\tilde{D}}_{B o o t}$ at the Varying of E(n_j ) and D, Fixed k = 50 and p = .5.

		D
E(n_j )	Estimator	0.000	0.056	0.127	0.225	0.292	0.382	0.634	0.817
6	$\hat{D}$	0.317	0.263	0.209	0.150	0.122	0.094	0.042	0.021
	${\hat{D}}_{B o o t}$	0.194	0.141	0.093	0.047	0.029	0.017	0.000	0.001
	${\tilde{D}}_{B o o t}$	0.098	0.047	0.013	−0.005	−0.005	−0.007	−0.002	−0.003
10	$\hat{D}$	0.247	0.196	0.145	0.099	0.078	0.058	0.024	0.012
	${\hat{D}}_{B o o t}$	0.148	0.098	0.054	0.023	0.012	0.005	−0.005	−0.001
	${\tilde{D}}_{B o o t}$	0.071	0.025	−0.007	−0.011	−0.007	−0.005	−0.003	−0.002
20	$\hat{D}$	0.175	0.126	0.084	0.051	0.040	0.030	0.012	0.005
	${\hat{D}}_{B o o t}$	0.103	0.056	0.022	0.004	0.002	0.001	−0.001	−0.001
	${\tilde{D}}_{B o o t}$	0.048	0.006	−0.011	−0.003	−0.005	−0.004	−0.003	−0.001
30	$\hat{D}$	0.144	0.097	0.060	0.033	0.026	0.018	0.008	0.004
	${\hat{D}}_{B o o t}$	0.086	0.041	0.013	0.000	0.000	−0.001	−0.001	0.000
	${\tilde{D}}_{B o o t}$	0.039	0.000	−0.009	−0.005	−0.004	−0.002	−0.001	−0.002
40	$\hat{D}$	0.124	0.079	0.046	0.025	0.020	0.014	0.007	0.002
	${\hat{D}}_{B o o t}$	0.073	0.031	0.007	−0.001	0.000	0.000	0.000	0.000
	${\tilde{D}}_{B o o t}$	0.033	−0.003	−0.007	−0.003	−0.001	−0.002	−0.002	−0.001
50	$\hat{D}$	0.111	0.067	0.037	0.021	0.016	0.011	0.004	0.003
	${\hat{D}}_{B o o t}$	0.065	0.025	0.004	0.000	0.000	0.000	−0.001	0.001
	${\tilde{D}}_{B o o t}$	0.030	−0.006	−0.004	−0.003	−0.003	−0.001	−0.001	−0.001
100	$\hat{D}$	0.079	0.039	0.020	0.011	0.008	0.005	0.003	0.001
	${\hat{D}}_{B o o t}$	0.046	0.010	0.001	0.000	0.001	0.000	0.001	0.000
	${\tilde{D}}_{B o o t}$	0.021	−0.005	−0.003	0.000	0.000	0.000	−0.001	−0.001
200	$\hat{D}$	0.056	0.021	0.010	0.006	0.004	0.003	0.001	0.001
	${\hat{D}}_{B o o t}$	0.033	0.004	0.000	0.000	0.000	0.000	0.000	0.000
	${\tilde{D}}_{B o o t}$	0.015	−0.003	−0.001	0.000	0.000	−0.001	−0.001	−0.001

Note: For ${\hat{D}}_{B o o t}$ , B = 100 bootstrap replications are considered.

Table 6.

Mean Bias, Over 1,000 Replications, for $\hat{D}$ , ${\hat{D}}_{B o o t}$ , and ${\tilde{D}}_{B o o t}$ at the Varying of E(n_j ) and D, Fixed k = 50 and p = .01.

		D
E(n_j )	Estimator	0.000	0.056	0.127	0.225	0.292	0.382	0.634	0.817
6	$\hat{D}$	0.942	0.885	0.815	0.717	0.649	0.560	0.312	0.145
	${\hat{D}}_{B o o t}$	0.925	0.868	0.798	0.700	0.632	0.543	0.298	0.138
	${\tilde{D}}_{B o o t}$	0.895	0.839	0.765	0.671	0.602	0.515	0.277	0.128
10	$\hat{D}$	0.904	0.848	0.778	0.679	0.614	0.525	0.287	0.134
	${\hat{D}}_{B o o t}$	0.874	0.817	0.748	0.648	0.584	0.496	0.266	0.126
	${\tilde{D}}_{B o o t}$	0.822	0.764	0.694	0.599	0.535	0.448	0.232	0.112
20	$\hat{D}$	0.818	0.761	0.690	0.596	0.533	0.450	0.239	0.112
	${\hat{D}}_{B o o t}$	0.759	0.702	0.631	0.539	0.478	0.398	0.207	0.098
	${\tilde{D}}_{B o o t}$	0.662	0.604	0.541	0.450	0.392	0.325	0.173	0.086
30	$\hat{D}$	0.738	0.685	0.614	0.523	0.461	0.386	0.204	0.096
	${\hat{D}}_{B o o t}$	0.654	0.602	0.532	0.445	0.385	0.317	0.165	0.078
	${\tilde{D}}_{B o o t}$	0.538	0.480	0.411	0.335	0.283	0.233	0.129	0.067
40	$\hat{D}$	0.668	0.612	0.546	0.457	0.400	0.333	0.174	0.083
	${\hat{D}}_{B o o t}$	0.563	0.507	0.445	0.361	0.309	0.251	0.129	0.062
	${\tilde{D}}_{B o o t}$	0.427	0.370	0.313	0.251	0.211	0.172	0.103	0.051
50	$\hat{D}$	0.602	0.549	0.485	0.399	0.348	0.285	0.148	0.070
	${\hat{D}}_{B o o t}$	0.480	0.428	0.368	0.289	0.246	0.194	0.099	0.047
	${\tilde{D}}_{B o o t}$	0.337	0.287	0.236	0.174	0.144	0.126	0.075	0.039
100	$\hat{D}$	0.378	0.327	0.266	0.203	0.165	0.128	0.058	0.026
	${\hat{D}}_{B o o t}$	0.219	0.170	0.112	0.064	0.037	0.017	−0.005	−0.004
	${\tilde{D}}_{B o o t}$	0.101	0.052	−0.001	−0.033	−0.026	−0.020	−0.010	−0.007
200	$\hat{D}$	0.277	0.225	0.173	0.120	0.095	0.067	0.030	0.013
	${\hat{D}}_{B o o t}$	0.163	0.113	0.066	0.027	0.013	−0.002	−0.006	−0.004
	${\tilde{D}}_{B o o t}$	0.072	0.029	−0.012	−0.021	−0.017	−0.013	−0.008	−0.004

Note: For ${\hat{D}}_{B o o t}$ , B = 100 bootstrap replications are considered.

To evaluate the estimation accuracy of the competing estimators, Tables 11 to 15 show the MSEs under the same conditions of Tables 6 to 10. As for the bias, ${\tilde{D}}_{B o o t}$ is clearly the best performer, in terms of estimation accuracy, while $\hat{D}$ is the worst one. This is an important, and not trivial, result because the reduction in bias may well be more than offset by the increase in the variance of the estimator, having an increase in the MSE as a final effect.

Table 15.

MSE, Over 1,000 Replications, for $\hat{D}$ , ${\hat{D}}_{B o o t}$ , and ${\tilde{D}}_{B o o t}$ at the Varying of E(n_j ) and D, Fixed k = 50 and p = .5.

		D
E(n_j )	Estimator	0.000	0.056	0.127	0.225	0.292	0.382	0.634	0.817
6	$\hat{D}$	0.100	0.071	0.046	0.024	0.016	0.010	0.004	0.001
	${\hat{D}}_{B o o t}$	0.039	0.022	0.011	0.005	0.004	0.003	0.002	0.001
	${\tilde{D}}_{B o o t}$	0.014	0.007	0.006	0.006	0.005	0.003	0.002	0.001
10	$\hat{D}$	0.062	0.039	0.022	0.011	0.007	0.004	0.002	0.001
	${\hat{D}}_{B o o t}$	0.023	0.011	0.005	0.002	0.002	0.002	0.001	0.001
	${\tilde{D}}_{B o o t}$	0.008	0.003	0.004	0.003	0.002	0.002	0.001	0.001
20	$\hat{D}$	0.031	0.016	0.008	0.003	0.002	0.001	0.001	0.000
	${\hat{D}}_{B o o t}$	0.011	0.004	0.001	0.001	0.001	0.001	0.001	0.000
	${\tilde{D}}_{B o o t}$	0.003	0.001	0.002	0.001	0.001	0.001	0.001	0.000
30	$\hat{D}$	0.021	0.010	0.004	0.002	0.001	0.001	0.000	0.000
	${\hat{D}}_{B o o t}$	0.008	0.002	0.001	0.001	0.001	0.001	0.000	0.000
	${\tilde{D}}_{B o o t}$	0.003	0.001	0.001	0.001	0.001	0.001	0.000	0.000
40	$\hat{D}$	0.016	0.006	0.002	0.001	0.001	0.001	0.000	0.000
	${\hat{D}}_{B o o t}$	0.006	0.001	0.001	0.000	0.000	0.000	0.000	0.000
	${\tilde{D}}_{B o o t}$	0.002	0.001	0.001	0.001	0.001	0.000	0.000	0.000
50	$\hat{D}$	0.013	0.005	0.002	0.001	0.001	0.000	0.000	0.000
	${\hat{D}}_{B o o t}$	0.005	0.001	0.000	0.000	0.000	0.000	0.000	0.000
	${\tilde{D}}_{B o o t}$	0.001	0.001	0.001	0.000	0.000	0.000	0.000	0.000
100	$\hat{D}$	0.006	0.002	0.000	0.000	0.000	0.000	0.000	0.000
	${\hat{D}}_{B o o t}$	0.002	0.000	0.000	0.000	0.000	0.000	0.000	0.000
	${\tilde{D}}_{B o o t}$	0.001	0.000	0.000	0.000	0.000	0.000	0.000	0.000
200	$\hat{D}$	0.003	0.001	0.000	0.000	0.000	0.000	0.000	0.000
	${\hat{D}}_{B o o t}$	0.001	0.000	0.000	0.000	0.000	0.000	0.000	0.000
	${\tilde{D}}_{B o o t}$	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000

Note: MSE = mean standard error. For ${\hat{D}}_{B o o t}$ , B = 100 bootstrap replications are considered.

Table 11.

MSE, Over 1,000 Replications, for $\hat{D}$ , ${\hat{D}}_{B o o t}$ , and ${\tilde{D}}_{B o o t}$ at the Varying of E(n_j ) and D, Fixed k = 50 and p = .01.

		D
E(n_j )	Estimator	0.000	0.056	0.127	0.225	0.292	0.382	0.634	0.817
6	$\hat{D}$	0.886	0.784	0.664	0.513	0.423	0.314	0.099	0.021
	${\hat{D}}_{B o o t}$	0.854	0.754	0.637	0.489	0.402	0.295	0.091	0.020
	${\tilde{D}}_{B o o t}$	0.801	0.704	0.592	0.449	0.367	0.266	0.078	0.017
10	$\hat{D}$	0.818	0.721	0.604	0.463	0.376	0.276	0.084	0.018
	${\hat{D}}_{B o o t}$	0.765	0.671	0.557	0.423	0.341	0.247	0.073	0.016
	${\tilde{D}}_{B o o t}$	0.676	0.589	0.482	0.360	0.285	0.203	0.059	0.014
20	$\hat{D}$	0.670	0.580	0.479	0.358	0.284	0.203	0.058	0.013
	${\hat{D}}_{B o o t}$	0.577	0.493	0.402	0.294	0.229	0.160	0.045	0.011
	${\tilde{D}}_{B o o t}$	0.445	0.370	0.294	0.209	0.158	0.108	0.034	0.009
30	$\hat{D}$	0.546	0.465	0.377	0.274	0.216	0.150	0.043	0.010
	${\hat{D}}_{B o o t}$	0.429	0.358	0.282	0.198	0.153	0.103	0.029	0.007
	${\tilde{D}}_{B o o t}$	0.286	0.229	0.174	0.117	0.090	0.061	0.021	0.006
40	$\hat{D}$	0.444	0.375	0.299	0.211	0.164	0.111	0.032	0.007
	${\hat{D}}_{B o o t}$	0.315	0.260	0.200	0.133	0.101	0.065	0.019	0.005
	${\tilde{D}}_{B o o t}$	0.184	0.146	0.110	0.069	0.054	0.036	0.014	0.004
50	$\hat{D}$	0.369	0.301	0.234	0.159	0.123	0.084	0.023	0.006
	${\hat{D}}_{B o o t}$	0.240	0.185	0.136	0.084	0.064	0.042	0.012	0.004
	${\tilde{D}}_{B o o t}$	0.132	0.092	0.064	0.039	0.032	0.023	0.009	0.003
100	$\hat{D}$	0.144	0.108	0.073	0.043	0.030	0.018	0.005	0.002
	${\hat{D}}_{B o o t}$	0.051	0.033	0.017	0.008	0.006	0.005	0.003	0.002
	${\tilde{D}}_{B o o t}$	0.015	0.009	0.008	0.011	0.011	0.008	0.004	0.002
200	$\hat{D}$	0.077	0.051	0.031	0.015	0.010	0.006	0.002	0.001
	${\hat{D}}_{B o o t}$	0.028	0.014	0.007	0.003	0.002	0.002	0.002	0.001
	${\tilde{D}}_{B o o t}$	0.008	0.004	0.005	0.005	0.004	0.003	0.002	0.001

Note: MSE = mean standard error. For ${\hat{D}}_{B o o t}$ , B = 100 bootstrap replications are considered.

We evaluated, also, a grouped jackknife estimator and a double bootstrap estimator; these methods are documented, in their rationale, respectively, in Efron (1982, section 2.2) and Davison and Hinkley (1997, section 3.9). For brevity, tables with mean bias and MSE for these estimators are not reported here but can be obtained from the authors upon request. The grouped jackknife estimator, in all the considered scenarios of simulations, showed only a negligible improvement in mean bias and MSE with respect to $\hat{D}$ , with ${\hat{D}}_{B o o t}$ performing much better. As to the double bootstrap approach, the added level of bootstrap did improve the performance of ${\hat{D}}_{B o o t}$ in terms of mean bias and MSE; however, these improvements were only marginal, and very far from counterbalancing the computational burden required; more importantly, they were lower than those obtained with ${\tilde{D}}_{B o o t}$ .

Conclusions

It has long been recognized that the sensitivity of the dissimilarity index of Duncan and Duncan (1955) to random allocation implies an upward bias, particularly evident with smaller unit sizes, small minority proportions, and lower levels of segregation (see, e.g., Carrington and Troske 1997; Cortese et al. 1976; Ransom 2000). In this article, following the multinomial framework of Allen et al. (2009), we have demonstrated analytically the nonnegativity of this bias. Furthermore, we have shown that the same bootstrap-based bias correction introduced by Allen et al. (2009) can be obtained analytically, without resorting to resampling techniques. Finally, we have introduced a new bias correction in simulations that always performed better than the previous one in terms of both mean bias and MSE; nevertheless, for reliable estimations, both minority proportion and unit sizes do not have to be very small. Alternative estimators, based on grouped jackknife and double bootstrap, have also been evaluated, in terms of both mean bias and MSE. The grouped jackknife bias-corrected estimator exhibited only a little improvement over the natural estimator and so did the double bootstrap estimator with respect to the bootstrap bias-corrected one. Our bias correction procedure has been implemented in the R language and may be requested to the authors.

Figure 3.

Comparison between $B i a s (\hat{D})$ , $B i a s ({\hat{D}}_{B o o t})$ , and $B i a s ({\tilde{D}}_{B o o t})$ , at the varying of E(n_j ), fixed p = .38 and D = 0. For ${\hat{D}}_{B o o t}$ and ${\tilde{D}}_{B o o t}$ , B = 100 bootstrap replications are considered.

Table 3.

Mean Bias, Over 1,000 Replications, for $\hat{D}$ , ${\hat{D}}_{B o o t}$ , ${\hat{D}}_{B i n B i n}$ , and ${\hat{D}}_{F o l d e d}$ at the Varying of E(n_j ) and D, Fixed k = 50 and p = .1.

		D
E(n_j )	Estimator	0.000	0.056	0.127	0.225	0.292	0.382	0.634	0.817
6	$\hat{D}$	0.550	0.494	0.431	0.352	0.304	0.248	0.127	0.059
	${\hat{D}}_{B o o t}$	0.412	0.356	0.297	0.228	0.189	0.147	0.072	0.033
	${\hat{D}}_{B i n B i n}$	0.412	0.356	0.297	0.228	0.189	0.148	0.072	0.032
	${\hat{D}}_{F o l d e d}$	0.395	0.338	0.279	0.210	0.172	0.133	0.063	0.027
10	$\hat{D}$	0.414	0.359	0.298	0.228	0.194	0.151	0.074	0.033
	${\hat{D}}_{B o o t}$	0.263	0.208	0.151	0.093	0.072	0.045	0.015	0.004
	${\hat{D}}_{B i n B i n}$	0.263	0.208	0.151	0.093	0.072	0.045	0.015	0.004
	${\hat{D}}_{F o l d e d}$	0.253	0.198	0.141	0.083	0.063	0.037	0.011	0.002
20	$\hat{D}$	0.294	0.241	0.188	0.132	0.105	0.081	0.032	0.016
	${\hat{D}}_{B o o t}$	0.178	0.127	0.080	0.036	0.020	0.010	−0.007	−0.002
	${\hat{D}}_{B i n B i n}$	0.178	0.127	0.079	0.036	0.020	0.010	−0.007	−0.002
	${\hat{D}}_{F o l d e d}$	0.177	0.125	0.078	0.035	0.019	0.010	−0.006	−0.001
30	$\hat{D}$	0.240	0.190	0.140	0.091	0.071	0.055	0.022	0.010
	${\hat{D}}_{B o o t}$	0.144	0.095	0.052	0.016	0.007	0.004	−0.004	−0.002
	${\hat{D}}_{B i n B i n}$	0.144	0.095	0.052	0.016	0.007	0.004	−0.004	−0.002
	${\hat{D}}_{F o l d e d}$	0.143	0.094	0.051	0.016	0.007	0.005	−0.003	−0.001
40	$\hat{D}$	0.207	0.159	0.110	0.072	0.055	0.040	0.016	0.008
	${\hat{D}}_{B o o t}$	0.123	0.077	0.035	0.011	0.004	0.001	−0.003	−0.001
	${\hat{D}}_{B i n B i n}$	0.123	0.077	0.035	0.011	0.004	0.001	−0.003	−0.001
	${\hat{D}}_{F o l d e d}$	0.122	0.077	0.034	0.011	0.004	0.001	−0.002	0.000
50	$\hat{D}$	0.185	0.138	0.094	0.058	0.043	0.031	0.014	0.007
	${\hat{D}}_{B o o t}$	0.109	0.065	0.028	0.007	0.001	−0.001	−0.001	0.001
	${\hat{D}}_{B i n B i n}$	0.109	0.065	0.028	0.007	0.001	−0.001	−0.001	0.001
	${\hat{D}}_{F o l d e d}$	0.109	0.065	0.028	0.007	0.001	−0.001	−0.001	0.001

Note: For ${\hat{D}}_{B o o t}$ , B = 100 bootstrap replications are considered.

Table 4.

Mean Bias, Over 1,000 Replications, for $\hat{D}$ , ${\hat{D}}_{B o o t}$ , ${\hat{D}}_{B i n B i n}$ , and ${\hat{D}}_{F o l d e d}$ at the Varying of E(n_j ) and D, Fixed k = 50 and p = .3.

		D
E(n_j )	Estimator	0.000	0.056	0.127	0.225	0.292	0.382	0.634	0.817
6	$\hat{D}$	0.350	0.298	0.243	0.177	0.146	0.112	0.052	0.022
	${\hat{D}}_{B o o t}$	0.219	0.169	0.119	0.065	0.045	0.026	0.004	−0.002
	${\hat{D}}_{B i n B i n}$	0.219	0.169	0.119	0.065	0.045	0.026	0.004	−0.002
	${\hat{D}}_{F o l d e d}$	0.213	0.163	0.114	0.058	0.040	0.021	0.000	−0.007
10	$\hat{D}$	0.270	0.220	0.168	0.116	0.091	0.071	0.030	0.016
	${\hat{D}}_{B o o t}$	0.163	0.114	0.068	0.031	0.016	0.009	−0.002	0.000
	${\hat{D}}_{B i n B i n}$	0.163	0.114	0.068	0.031	0.016	0.009	−0.002	0.001
	${\hat{D}}_{F o l d e d}$	0.161	0.112	0.066	0.029	0.014	0.008	−0.003	0.000
20	$\hat{D}$	0.192	0.143	0.098	0.061	0.047	0.032	0.013	0.006
	${\hat{D}}_{B o o t}$	0.115	0.068	0.030	0.007	0.003	−0.003	−0.003	−0.001
	${\hat{D}}_{B i n B i n}$	0.115	0.068	0.030	0.007	0.003	−0.003	−0.003	−0.001
	${\hat{D}}_{F o l d e d}$	0.114	0.067	0.030	0.006	0.002	−0.003	−0.003	−0.001
30	$\hat{D}$	0.157	0.109	0.069	0.041	0.031	0.023	0.010	0.004
	${\hat{D}}_{B o o t}$	0.093	0.047	0.016	0.003	0.000	0.000	0.000	−0.001
	${\hat{D}}_{B i n B i n}$	0.093	0.047	0.016	0.003	0.000	0.000	0.000	−0.001
	${\hat{D}}_{F o l d e d}$	0.093	0.047	0.015	0.002	0.000	0.000	0.000	0.000
40	$\hat{D}$	0.136	0.090	0.055	0.031	0.023	0.016	0.006	0.003
	${\hat{D}}_{B o o t}$	0.080	0.037	0.011	0.001	−0.001	−0.001	−0.002	0.000
	${\hat{D}}_{B i n B i n}$	0.080	0.037	0.011	0.001	−0.001	−0.001	−0.002	0.000
	${\hat{D}}_{F o l d e d}$	0.080	0.037	0.011	0.001	0.000	−0.001	−0.002	0.000
50	$\hat{D}$	0.121	0.077	0.045	0.025	0.018	0.013	0.006	0.002
	${\hat{D}}_{B o o t}$	0.071	0.030	0.007	0.001	−0.001	0.000	0.000	0.000
	${\hat{D}}_{B i n B i n}$	0.071	0.030	0.007	0.001	−0.001	0.000	0.000	0.000
	${\hat{D}}_{F o l d e d}$	0.071	0.030	0.007	0.001	−0.001	0.000	0.000	0.000

Note: For ${\hat{D}}_{B o o t}$ , B = 100 bootstrap replications are considered.

Table 7.

Mean Bias, Over 1,000 Replications, for $\hat{D}$ , ${\hat{D}}_{B o o t}$ , and ${\tilde{D}}_{B o o t}$ at the Varying of E(n_j ) and D, Fixed k = 50 and p = .05.

		D
E(n_j )	Estimator	0.000	0.056	0.127	0.225	0.292	0.382	0.634	0.817
6	$\hat{D}$	0.739	0.684	0.613	0.522	0.461	0.386	0.203	0.096
	${\hat{D}}_{B o o t}$	0.654	0.600	0.529	0.442	0.385	0.318	0.163	0.077
	${\tilde{D}}_{B o o t}$	0.539	0.487	0.426	0.341	0.296	0.240	0.132	0.064
10	$\hat{D}$	0.604	0.547	0.482	0.399	0.350	0.287	0.148	0.071
	${\hat{D}}_{B o o t}$	0.482	0.425	0.363	0.288	0.247	0.197	0.098	0.047
	${\tilde{D}}_{B o o t}$	0.350	0.283	0.230	0.177	0.151	0.122	0.074	0.036
20	$\hat{D}$	0.396	0.346	0.285	0.220	0.180	0.141	0.064	0.029
	${\hat{D}}_{B o o t}$	0.241	0.194	0.136	0.085	0.056	0.034	0.002	−0.001
	${\tilde{D}}_{B o o t}$	0.126	0.070	0.022	−0.010	−0.014	−0.009	−0.005	−0.004
30	$\hat{D}$	0.334	0.283	0.226	0.161	0.134	0.099	0.045	0.018
	${\hat{D}}_{B o o t}$	0.209	0.159	0.107	0.052	0.036	0.015	−0.002	−0.005
	${\tilde{D}}_{B o o t}$	0.103	0.057	0.018	−0.005	−0.008	−0.005	−0.005	−0.005
40	$\hat{D}$	0.286	0.234	0.180	0.126	0.101	0.074	0.031	0.014
	${\hat{D}}_{B o o t}$	0.172	0.121	0.073	0.032	0.019	0.005	−0.007	−0.004
	${\tilde{D}}_{B o o t}$	0.083	0.036	−0.003	−0.010	−0.009	−0.010	−0.008	−0.003
50	$\hat{D}$	0.258	0.205	0.155	0.105	0.080	0.058	0.026	0.012
	${\hat{D}}_{B o o t}$	0.156	0.105	0.061	0.024	0.009	0.001	−0.003	−0.001
	${\tilde{D}}_{B o o t}$	0.076	0.031	−0.004	−0.010	−0.008	−0.006	−0.005	−0.003
100	$\hat{D}$	0.181	0.132	0.088	0.053	0.040	0.029	0.012	0.005
	${\hat{D}}_{B o o t}$	0.107	0.060	0.023	0.003	0.000	−0.002	−0.002	−0.001
	${\tilde{D}}_{B o o t}$	0.050	0.007	−0.013	−0.006	−0.005	−0.005	−0.002	−0.001
200	$\hat{D}$	0.128	0.082	0.049	0.028	0.020	0.015	0.006	0.003
	${\hat{D}}_{B o o t}$	0.076	0.032	0.009	0.001	−0.001	0.000	−0.001	0.000
	${\tilde{D}}_{B o o t}$	0.035	−0.004	−0.006	−0.003	−0.003	−0.001	−0.001	−0.001

Note: For ${\hat{D}}_{B o o t}$ , B = 100 bootstrap replications are considered.

Table 8.

Mean Bias, Over 1,000 Replications, for $\hat{D}$ , ${\hat{D}}_{B o o t}$ , and ${\tilde{D}}_{B o o t}$ at the Varying of E(n_j ) and D, Fixed k = 50 and p = .1.

		D
E(n_j )	Estimator	0.000	0.056	0.127	0.225	0.292	0.382	0.634	0.817
6	$\hat{D}$	0.550	0.494	0.431	0.352	0.304	0.248	0.127	0.059
	${\hat{D}}_{B o o t}$	0.412	0.356	0.297	0.228	0.189	0.147	0.072	0.033
	${\tilde{D}}_{B o o t}$	0.260	0.220	0.166	0.121	0.095	0.075	0.055	0.028
10	$\hat{D}$	0.414	0.359	0.298	0.228	0.194	0.151	0.074	0.033
	${\hat{D}}_{B o o t}$	0.263	0.208	0.151	0.093	0.072	0.045	0.015	0.004
	${\tilde{D}}_{B o o t}$	0.140	0.085	0.034	0.002	0.000	0.009	0.002	0.000
20	$\hat{D}$	0.294	0.241	0.188	0.132	0.105	0.081	0.032	0.016
	${\hat{D}}_{B o o t}$	0.178	0.127	0.080	0.036	0.020	0.010	−0.007	−0.002
	${\tilde{D}}_{B o o t}$	0.085	0.041	0.002	−0.009	−0.010	−0.009	−0.007	−0.004
30	$\hat{D}$	0.240	0.190	0.140	0.091	0.071	0.055	0.022	0.010
	${\hat{D}}_{B o o t}$	0.144	0.095	0.052	0.016	0.007	0.004	−0.004	−0.002
	${\tilde{D}}_{B o o t}$	0.069	0.020	−0.007	−0.007	−0.007	−0.009	−0.006	−0.002
40	$\hat{D}$	0.207	0.159	0.110	0.072	0.055	0.040	0.016	0.008
	${\hat{D}}_{B o o t}$	0.123	0.077	0.035	0.011	0.004	0.001	−0.003	−0.001
	${\tilde{D}}_{B o o t}$	0.057	0.015	−0.010	−0.008	−0.008	−0.006	−0.002	−0.001
50	$\hat{D}$	0.185	0.138	0.094	0.058	0.043	0.031	0.014	0.007
	${\hat{D}}_{B o o t}$	0.109	0.065	0.028	0.007	0.001	−0.001	−0.001	0.001
	${\tilde{D}}_{B o o t}$	0.053	0.008	−0.012	−0.008	−0.005	−0.002	−0.001	−0.002
100	$\hat{D}$	0.131	0.085	0.051	0.029	0.022	0.017	0.007	0.003
	${\hat{D}}_{B o o t}$	0.077	0.034	0.009	0.001	0.000	0.001	0.000	0.000
	${\tilde{D}}_{B o o t}$	0.036	−0.003	−0.006	−0.003	−0.002	−0.001	0.000	−0.001
200	$\hat{D}$	0.094	0.051	0.026	0.014	0.011	0.008	0.003	0.002
	${\hat{D}}_{B o o t}$	0.056	0.016	0.002	0.000	0.000	0.000	0.000	0.001
	${\tilde{D}}_{B o o t}$	0.024	−0.007	−0.003	−0.001	0.000	−0.001	−0.002	−0.001

Note: For ${\hat{D}}_{B o o t}$ , B = 100 bootstrap replications are considered.

Table 9.

Mean Bias, Over 1,000 Replications, for $\hat{D}$ , ${\hat{D}}_{B o o t}$ , and ${\tilde{D}}_{B o o t}$ at the Varying of E(n_j ) and D, Fixed k = 50 and p = .3.

		D
E(n_j )	Estimator	0.000	0.056	0.127	0.225	0.292	0.382	0.634	0.817
6	$\hat{D}$	0.350	0.298	0.243	0.177	0.146	0.112	0.052	0.022
	${\hat{D}}_{B o o t}$	0.219	0.169	0.119	0.065	0.045	0.026	0.004	−0.002
	${\tilde{D}}_{B o o t}$	0.110	0.063	0.020	−0.002	0.003	0.004	0.000	0.000
10	$\hat{D}$	0.270	0.220	0.168	0.116	0.091	0.071	0.030	0.016
	${\hat{D}}_{B o o t}$	0.163	0.114	0.068	0.031	0.016	0.009	−0.002	0.000
	${\tilde{D}}_{B o o t}$	0.079	0.032	−0.004	−0.007	−0.007	−0.004	−0.002	−0.004
20	$\hat{D}$	0.192	0.143	0.098	0.061	0.047	0.032	0.013	0.006
	${\hat{D}}_{B o o t}$	0.115	0.068	0.030	0.007	0.003	−0.003	−0.003	−0.001
	${\tilde{D}}_{B o o t}$	0.054	0.011	−0.009	−0.007	−0.004	−0.004	−0.004	−0.001
30	$\hat{D}$	0.157	0.109	0.069	0.041	0.031	0.023	0.010	0.004
	${\hat{D}}_{B o o t}$	0.093	0.047	0.016	0.003	0.000	0.000	0.000	−0.001
	${\tilde{D}}_{B o o t}$	0.043	0.001	−0.009	−0.004	−0.004	−0.004	0.000	−0.001
40	$\hat{D}$	0.136	0.090	0.055	0.031	0.023	0.016	0.006	0.003
	${\hat{D}}_{B o o t}$	0.080	0.037	0.011	0.001	−0.001	−0.001	−0.002	0.000
	${\tilde{D}}_{B o o t}$	0.037	−0.004	−0.007	−0.005	−0.002	0.000	−0.001	−0.001
50	$\hat{D}$	0.121	0.077	0.045	0.025	0.018	0.013	0.006	0.002
	${\hat{D}}_{B o o t}$	0.071	0.030	0.007	0.001	−0.001	0.000	0.000	0.000
	${\tilde{D}}_{B o o t}$	0.033	−0.005	−0.005	−0.003	−0.002	0.000	−0.001	−0.001
100	$\hat{D}$	0.087	0.045	0.023	0.012	0.009	0.007	0.003	0.001
	${\hat{D}}_{B o o t}$	0.051	0.014	0.001	0.000	0.000	0.000	0.000	0.000
	${\tilde{D}}_{B o o t}$	0.023	−0.006	−0.004	−0.002	0.000	0.000	−0.001	−0.001
200	$\hat{D}$	0.061	0.024	0.011	0.006	0.005	0.004	0.001	0.001
	${\hat{D}}_{B o o t}$	0.036	0.004	0.000	0.000	0.000	0.000	−0.001	0.001
	${\tilde{D}}_{B o o t}$	0.016	−0.004	−0.001	0.000	0.000	−0.001	−0.001	−0.001

Note: For ${\hat{D}}_{B o o t}$ , B = 100 bootstrap replications are considered.

Table 12.

MSE, Over 1,000 Replications, for $\hat{D}$ , ${\hat{D}}_{B o o t}$ , and ${\tilde{D}}_{B o o t}$ at the Varying of E(n_j ) and D, Fixed k = 50 and p = .05.

		D
E(n_j )	Estimator	0.000	0.056	0.127	0.225	0.292	0.382	0.634	0.817
6	$\hat{D}$	0.547	0.468	0.377	0.275	0.216	0.150	0.043	0.010
	${\hat{D}}_{B o o t}$	0.430	0.361	0.283	0.200	0.153	0.104	0.029	0.007
	${\tilde{D}}_{B o o t}$	0.298	0.242	0.181	0.126	0.095	0.065	0.022	0.006
10	$\hat{D}$	0.365	0.300	0.235	0.163	0.123	0.083	0.024	0.006
	${\hat{D}}_{B o o t}$	0.235	0.184	0.137	0.089	0.064	0.041	0.013	0.004
	${\tilde{D}}_{B o o t}$	0.129	0.094	0.069	0.045	0.033	0.024	0.010	0.004
20	$\hat{D}$	0.162	0.121	0.084	0.049	0.034	0.023	0.006	0.002
	${\hat{D}}_{B o o t}$	0.064	0.041	0.024	0.011	0.007	0.005	0.003	0.002
	${\tilde{D}}_{B o o t}$	0.023	0.012	0.010	0.011	0.010	0.006	0.003	0.002
30	$\hat{D}$	0.113	0.079	0.051	0.028	0.018	0.012	0.003	0.001
	${\hat{D}}_{B o o t}$	0.046	0.026	0.013	0.006	0.004	0.003	0.002	0.001
	${\tilde{D}}_{B o o t}$	0.017	0.007	0.006	0.006	0.005	0.003	0.002	0.001
40	$\hat{D}$	0.082	0.055	0.034	0.017	0.011	0.007	0.002	0.001
	${\hat{D}}_{B o o t}$	0.031	0.016	0.008	0.003	0.002	0.002	0.002	0.001
	${\tilde{D}}_{B o o t}$	0.010	0.004	0.005	0.004	0.003	0.003	0.002	0.001
50	$\hat{D}$	0.066	0.043	0.024	0.012	0.008	0.005	0.002	0.001
	${\hat{D}}_{B o o t}$	0.025	0.012	0.005	0.002	0.002	0.002	0.001	0.001
	${\tilde{D}}_{B o o t}$	0.008	0.004	0.004	0.003	0.003	0.002	0.001	0.001
100	$\hat{D}$	0.033	0.018	0.008	0.004	0.002	0.001	0.001	0.000
	${\hat{D}}_{B o o t}$	0.012	0.005	0.001	0.001	0.001	0.001	0.001	0.000
	${\tilde{D}}_{B o o t}$	0.004	0.002	0.002	0.001	0.001	0.001	0.001	0.000
200	$\hat{D}$	0.017	0.007	0.003	0.001	0.001	0.001	0.000	0.000
	${\hat{D}}_{B o o t}$	0.006	0.001	0.001	0.001	0.000	0.000	0.000	0.000
	${\tilde{D}}_{B o o t}$	0.002	0.001	0.001	0.001	0.001	0.000	0.000	0.000

Note: MSE = mean standard error. For ${\hat{D}}_{B o o t}$ , B = 100 bootstrap replications are considered.

Table 13.

MSE, Over 1,000 Replications, for $\hat{D}$ , ${\hat{D}}_{B o o t}$ , and ${\tilde{D}}_{B o o t}$ at the Varying of E(n_j ) and D, Fixed k = 50 and p = .1.

		D
E(n_j )	Estimator	0.000	0.056	0.127	0.225	0.292	0.382	0.634	0.817
6	$\hat{D}$	0.305	0.252	0.191	0.126	0.094	0.062	0.019	0.005
	${\hat{D}}_{B o o t}$	0.175	0.137	0.096	0.057	0.040	0.025	0.009	0.003
	${\tilde{D}}_{B o o t}$	0.086	0.066	0.045	0.028	0.022	0.016	0.008	0.003
10	$\hat{D}$	0.174	0.130	0.092	0.054	0.038	0.025	0.007	0.002
	${\hat{D}}_{B o o t}$	0.074	0.046	0.027	0.013	0.009	0.006	0.003	0.002
	${\tilde{D}}_{B o o t}$	0.028	0.015	0.010	0.010	0.009	0.007	0.004	0.002
20	$\hat{D}$	0.089	0.059	0.037	0.020	0.013	0.008	0.002	0.001
	${\hat{D}}_{B o o t}$	0.035	0.018	0.008	0.004	0.003	0.002	0.002	0.001
	${\tilde{D}}_{B o o t}$	0.012	0.005	0.005	0.005	0.004	0.003	0.002	0.001
30	$\hat{D}$	0.058	0.037	0.020	0.010	0.006	0.004	0.001	0.001
	${\hat{D}}_{B o o t}$	0.022	0.010	0.004	0.002	0.002	0.002	0.001	0.001
	${\tilde{D}}_{B o o t}$	0.007	0.003	0.004	0.003	0.002	0.002	0.001	0.001
40	$\hat{D}$	0.044	0.026	0.013	0.006	0.004	0.002	0.001	0.000
	${\hat{D}}_{B o o t}$	0.016	0.007	0.002	0.001	0.001	0.001	0.001	0.000
	${\tilde{D}}_{B o o t}$	0.005	0.002	0.003	0.002	0.002	0.001	0.001	0.000
50	$\hat{D}$	0.035	0.019	0.009	0.004	0.003	0.002	0.001	0.000
	${\hat{D}}_{B o o t}$	0.013	0.005	0.002	0.001	0.001	0.001	0.001	0.000
	${\tilde{D}}_{B o o t}$	0.004	0.002	0.002	0.001	0.001	0.001	0.001	0.000
100	$\hat{D}$	0.017	0.007	0.003	0.001	0.001	0.001	0.000	0.000
	${\hat{D}}_{B o o t}$	0.006	0.002	0.001	0.001	0.000	0.000	0.000	0.000
	${\tilde{D}}_{B o o t}$	0.002	0.001	0.001	0.001	0.001	0.000	0.000	0.000
200	$\hat{D}$	0.009	0.003	0.001	0.000	0.000	0.000	0.000	0.000
	${\hat{D}}_{B o o t}$	0.003	0.000	0.000	0.000	0.000	0.000	0.000	0.000
	${\tilde{D}}_{B o o t}$	0.001	0.001	0.000	0.000	0.000	0.000	0.000	0.000

Note: MSE = mean standard error. For ${\hat{D}}_{B o o t}$ , B = 100 bootstrap replications are considered.

Table 14.

MSE, Over 1,000 Replications, for $\hat{D}$ , ${\hat{D}}_{B o o t}$ , and ${\tilde{D}}_{B o o t}$ at the Varying of E(n_j ) and D, Fixed k = 50 and p = .3.

		D
E(n_j )	Estimator	0.000	0.056	0.127	0.225	0.292	0.382	0.634	0.817
6	$\hat{D}$	0.125	0.091	0.059	0.034	0.023	0.014	0.004	0.001
	${\hat{D}}_{B o o t}$	0.052	0.032	0.016	0.007	0.005	0.004	0.003	0.001
	${\tilde{D}}_{B o o t}$	0.019	0.010	0.007	0.007	0.006	0.004	0.003	0.001
10	$\hat{D}$	0.075	0.049	0.029	0.014	0.010	0.006	0.002	0.001
	${\hat{D}}_{B o o t}$	0.029	0.015	0.006	0.003	0.002	0.002	0.002	0.001
	${\tilde{D}}_{B o o t}$	0.009	0.004	0.004	0.004	0.003	0.002	0.002	0.001
20	$\hat{D}$	0.037	0.021	0.010	0.004	0.003	0.002	0.001	0.000
	${\hat{D}}_{B o o t}$	0.014	0.005	0.002	0.001	0.001	0.001	0.001	0.000
	${\tilde{D}}_{B o o t}$	0.004	0.002	0.002	0.001	0.001	0.001	0.001	0.000
30	$\hat{D}$	0.025	0.012	0.005	0.002	0.002	0.001	0.000	0.000
	${\hat{D}}_{B o o t}$	0.009	0.003	0.001	0.001	0.001	0.001	0.000	0.000
	${\tilde{D}}_{B o o t}$	0.003	0.001	0.001	0.001	0.001	0.001	0.000	0.000
40	$\hat{D}$	0.019	0.008	0.003	0.001	0.001	0.001	0.000	0.000
	${\hat{D}}_{B o o t}$	0.007	0.002	0.001	0.001	0.001	0.001	0.000	0.000
	${\tilde{D}}_{B o o t}$	0.002	0.001	0.001	0.001	0.001	0.001	0.000	0.000
50	$\hat{D}$	0.015	0.006	0.002	0.001	0.001	0.001	0.000	0.000
	${\hat{D}}_{B o o t}$	0.005	0.001	0.000	0.000	0.000	0.000	0.000	0.000
	${\tilde{D}}_{B o o t}$	0.002	0.001	0.001	0.001	0.001	0.000	0.000	0.000
100	$\hat{D}$	0.007	0.002	0.001	0.000	0.000	0.000	0.000	0.000
	${\hat{D}}_{B o o t}$	0.003	0.000	0.000	0.000	0.000	0.000	0.000	0.000
	${\tilde{D}}_{B o o t}$	0.001	0.000	0.000	0.000	0.000	0.000	0.000	0.000
200	$\hat{D}$	0.004	0.001	0.000	0.000	0.000	0.000	0.000	0.000
	${\hat{D}}_{B o o t}$	0.001	0.000	0.000	0.000	0.000	0.000	0.000	0.000
	${\tilde{D}}_{B o o t}$	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000

Note: MSE = mean standard error. For ${\hat{D}}_{B o o t}$ , B = 100 bootstrap replications are considered.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Allen

Burgess

Windmeijer

. 2009. “More Reliable Inference for Segregation Indices.” Technical Report 216, The Centre for Market and Public Organisation, University of Bristol, Bristol, UK.

Brent

R. P

. 1973. Algorithms for Minimization without Derivatives. Englewood Cliffs, NJ: Prentice Hall.

Carrington

W. J.

Troske

K. R.

. 1997. “On Measuring Segregation in Samples with Small Units.” Journal of Business & Economic Statistics 15:402–9.

Cortese

C. F.

Falk

R. F.

Cohen

J. K.

. 1976. “Further Considerations on the Methodological Analysis of Segregation Indices.” American Sociological Review 41:630–7.

Davison

A. C.

Hinkley

D. V.

. 1997. Bootstrap Methods and Their Application, volume 1 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge, UK: Cambridge University Press.

Deutsch

Silber

J. G.

. 2005. “Comparing Segregation by Gender in the Labor Force across Ten European Countries in the 1990s: An Analysis based on the Use of Normative Segregation Indices.” International Journal of Manpower 26:237–64.

Duncan

O. D.

Duncan

. 1955. “A Methodological Analysis of Segregation Indexes.” American Sociological Review 20:210–217.

Efron

1982. The Jackknife, the Bootstrap, and Other Resampling Plans, volume 38 of CBMS-NSF Regional Conference Series in Applied Mathematics. Philadelphia, PA: Society for Industrial and Applied Mathematics.

Farley

1975. “Residential Segregation and its Implications for School Integration.” Law and Contemporary Problems 39:164–93.

10.

Farley

Johnson

. 1985. “On the Statistical Significance of the Index of Dissimilarity.” Pp. 415–20 in Proceedings of the Social Statistics Section. Washington, DC: American Statistical Association.

11.

Flückiger

Silber

J. G.

. 1999. The Measurement of Segregation in the Labor Force. Heidelberg, Germany: Physica-Verlag.

12.

Hutchens

R. M.

1991. “Segregation Curves, Lorenz Curves, and Inequality in the Distribution of People across Occupations.” Mathematical Social Sciences 21:31–51.

13.

James

D. R.

Taeuber

K. E.

. 1985. “Measures of Segregation.” Pp. 1–32 in Sociological Methodology, edited by Tuma

N. B.

. San Francisco, CA: Jossey-Bass.

14.

Jerby

Semyonov

Lewin-Epstein

. 2005. “Capturing Gender-based Microsegregation a Modified Ratio Index for Comparative Analyses.” Sociological Methods & Research 34:122–36.

15.

Johnson

N. L.

Kemp

A. W.

Kotz

. 2005. Univariate Discrete Distributions. 3rd ed. Wiley Series in Probability and Statistics. Hoboken, NJ: John Wiley.

16.

Johnson

N. L.

Kotz

Balakrishnan

. 1997. Discrete Multivariate Distributions. Wiley Series in Probability and Statistics: Applied Probability and Statistics. New York: John Wiley.

17.

Kakwani

N. C

. 1994. “Segregation by Sex: Measurement and Hypothesis Testing.” Pp. 1–26 in Research in Economic Inequality, Inequality in Labor Markets: The Economics of Labor Market Segregation and Discrimination, edited by Neuman

Silber

J. G.

. Greenwich, UK: JAI Press.

18.

Karmel

Maclachlan

. 1988. “Occupational Sex Segregation - Increasing or Decreasing?” Economic Record 64:187–95.

19.

Leone

F. C.

Nelson

L. S.

Nottingham

R. B.

. 1961. “The Folded Normal Distribution.” Technometrics 3:543–50.

20.

Massey

D. S.

Denton

N. A.

. 1987. “Trends in the Residential Segregation of Blacks, Hispanics, and Asians: 1970-1980.” American Sociological Review 52:802–25.

21.

Massey

D. S.

Denton

N. A.

. 1988. “The Dimensions of Residential Segregation.” Social Forces 67:281–315.

22.

Massey

D. S.

White

M. J.

Phua

V.-C.

. 1996. “The Dimensions of Segregation Revisited.” Sociological Methods & Research 25:172–206.

23.

Ransom

M. R.

2000. “Sampling Distributions of Segregation Indexes.” Sociological Methods & Research 28:454–75.

24.

R. Core

Team

. 2013. R: A Language and Environment for Statistical Computing . Vienna, Austria: R Foundation for Statistical Computing.

25.

Silber

J. G.

1989. “On the Measurement of Employment Segregation.” Economics Letters 30:237–43.

26.

Silber

J. G.

Flückiger

Reardon

S. F.

. 2009. Occupational and Residential Segregation. 17 vols. Bingley, UK: Emerald Group Publishing.

27.

Taeuber

K. E.

Taeuber

A. F.

. 1965. Negroes in Cities: Residential Segregation and Neighborhood Change. Chicago, IL: Aldine.

28.

White

M. J.

1986. “Segregation and Diversity Measures in Population Distribution.” Population Index 52:198–221.

On the Upward Bias of the Dissimilarity Index and Its Corrections

Abstract

Keywords

Introduction

Inferential Framework and Notation

Bias of D ˆ

Determining Factors

Nonnegativity

A Bootstrap Bias Correction

Equivalent Analytical Formulations

Comparison

A New Bias Correction

Performance Evaluation

Conclusions

Footnotes

Declaration of Conflicting Interests

Funding

References

Bias of $\hat{D}$