Identifying Contributors of DNA Mixtures by Means of Quantitative Information of STR Typing

Abstract

Estimating the weight of evidence in forensic genetics is often done in terms of a likelihood ratio, LR. The LR evaluates the probability of the observed evidence under competing hypotheses. Most often, probabilities used in the LR only consider the evidence from the genomic variation identified using polymorphic genetic markers. However, modern typing techniques supply additional quantitative data, which contain very important information about the observed evidence. This is particularly true for cases of DNA mixtures, where more than one individual has contributed to the observed biological stain. This article presents a method for including the quantitative information of short tandem repeat (STR) DNA mixtures in the LR. Also, an efficient algorithmic method for finding the best matching combination of DNA mixture profiles is derived and implemented in an on-line tool for two- and three-person DNA mixtures. Finally, we demonstrate for two-person mixtures how this best matching pair of profiles can be used in estimating the likelihood ratio using importance sampling. The reason for using importance sampling for estimating the likelihood ratio is the often vast number of combinations of profiles needed for the evaluation of the weight of evidence. Online tool is available at http://people.math.aau.dk/∼tvede/dna/.

1. Introduction

When a crime has been committed, biological traces are often found at the scene of the crime. In many cases, more than one individual has contributed to the stain, which is then considered a DNA mixture. The evaluation of DNA mixtures are often complex and laborious, taking experienced case workers lots of time and effort to analyze.

Most modern DNA typing techniques are based upon polymerase chain reaction (PCR) producing millions of copies of the DNA string. The amount of DNA in the PCR vessel pre-PCR is reflected in the concentration of target molecules post-PCR. The targets used in forensic genetics are selected such that they are highly polymorphic (large number of possible alleles), which gives a high power of discrimination. Furthermore, the genetic markers used for forensic purposes are non-coding and should ideally be neutral with respect to selection.

The prevalent technology used in forensic genetics to perform genetic identification uses short tandem repeat (STR) polymorphisms. This method relies on variability in the length of certain repeat motifs in the genome. The STR DNA profile is observed via a so-called electropherogram (EPG), where the alleles are identified as signal peaks above a signal to noise threshold (see Fig. 2 below). For a single person DNA profile, one can observe either one or two peaks referring to the situation, where the DNA profile is either homozygous (identical alleles on both chromosomes) or heterozygous (different alleles on each chromosome). The commercial kits used for identification purposes typically contain between 10 to 15 genetic markers (also called loci: plural for locus). Within each locus, the number of alleles varies from 5 to 20. For the kit (SGM Plus Kit, Applied Biosystems) depicted in Figure 2 below, the labels “D3”, “vWA,” … “FGA” refer to locus names, and the integer values above the locus name correspond to the observed allele types for that particular locus.

It is possible only to observe the cumulative peaks in the EPG. That is, the peak heights are expected to be twice the height for homozygous loci relative to the heterozygous loci, since the two identical alleles doubles the amount of pre-PCR product for the homozygous peaks. This is also true for DNA mixtures where alleles shared by two or more contributors will reflect the contribution from more donors as higher peaks. Hence, for DNA mixtures with two contributors, the number of observable peaks ranges from one to four alleles depending on the particular profiles in the mixture.

The kit used for STR typing comprises a set of loci, \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $${ \cal S}$$ \end{document} , used for discrimination. For an arbitrary two-person mixture, the number of possible combinations are given by \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$1^{S_1}7^{S_2}12^{S_3}6^{S_4}$$ \end{document} , where S_i is the number of loci with i observations and \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$S = \sum \nolimits_{i = 1}^4 S_i$$ \end{document} , is the total number of loci used for discrimination, i.e., S is the size of \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $${ \cal S}$$ \end{document} . The numbers 1, 7, 12, and 6 comes from the number of possible combinations (Table 1) when observing 1, 2, 3, and 4 alleles, respectively.

Table 1.

Possible Combinations in a Two-Person Mixture with One to Four Alleles

Alleles	Possible combinations
a	(aa, aa)
a, b	(aa, ab)	(aa, bb)	(ab, aa)	(ab, ab)	(ab, bb)	(bb, ab)	(bb, aa)
a, b, c	(aa, bc)	(ab, ac)	(ab, bc)	(ab, cc)	(ac, ab)	(ac, bb)
	(ac, bc)	(bb, ac)	(bc, ab)	(bc, ac)	(bc, aa)	(cc, ab)
a, b, c, d	(ab, cd)	(ac, bd)	(ad, bc)	(bc, ad)	(bd, ac)	(cd, ab)

In most cases, this leads to an intractable number of combinations. However, using the quantitative STR data (peak heights and peak areas), the number of plausible combinations often decreases substantially. In this article, we develop a statistical model for STR DNA mixtures. The statistical model is intended to measure the agreement between the expected peak intensities for a proposed combination of DNA profiles and the actual observed peak intensities. Hence, we use an objective criterion to discriminate among the possible combinations in Table 1.

In order to incorporate the peak intensities in the likelihood ratio (LR), we first demonstrate how to find a best matching pair of profiles for a given two-person mixture using an efficient algorithmic approach. This algorithm iteratively builds up a best matching combination of profiles using the statistical model for the peak intensities. The algorithm has been implemented in a free online tool available at the first author's website (http://people.math.aau.dk/∼tvede/dna/). The statistical model and algorithmic construction are different from previously proposed methods for DNA mixture separation (Perlin and Szabady, 2001; Wang et al., 2006).

The inclusion of the quantitative information in the LR is done by assigning a weight to each combination of DNA profiles consistent with the observed STR types. The weight reflects the probability of observing the observed peak intensities given a specific combination of profiles. The denominator in the LR will in most cases yield a sum over an intractable number of combination. By sampling “close” to the best matching combination returned by the algorithm, we show how importance sampling may be used to estimate the LR.

2. Data

The model is based on exploration of controlled experiments of two-person mixtures conducted at The Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health Sciences, University of Copenhagen, Denmark. From the data exploration, it is evident that the mean and covariance structure of the peak areas must satisfy proportionality of:

• peak areas and peak heights,

• peak area and amount of DNA in the mixture,

• the mean and variance of the peak areas.

These assumptions are supported elsewhere (Figs. 1 and 2 in Tvedebrink et al., 2010). The experiments consisted of pairwise two-person mixtures in various mixture ratios of the four profiles in Table 2. The data were prepared as described in Tvedebrink et al. (2009).

FIG. 1.

Greedy algorithm for finding a pair of profiles (locally) maximizing the likelihood of (1).

FIG. 2.

Plot produced by the on-line implementation of the algorithm (http://people.math.aau.dk/∼tvede/dna/—sample data file “Paper case”). The observed peaks, ▴, are based on data from Table 4, and the expected peaks, ▵, assuming a mixture of the best matching pair of STR profiles (Table 5). The observed and expected peaks coincide for nearly all peaks.

Table 2.

The Four STR Profiles Used in the Controlled Pairwise Two-Person Mixture Experiments

	D3	vWA	D16	D2	D8	D21	D18	D19	TH0	FGA
A	14,18	17,19	12,14	20,24	10,13	30.2,32.2	13,13	12,13	8,9	20,22
B	15,16	14,16	10,12	17,25	13,16	30,30	13,13	14,15	6,9	19,23
C	15,16	15,17	11,11	19,25	8,12	29,31	15,17	13,13	6,8	23,24
D	16,19	15,17	10,12	23,25	13,13	28,30	12,16	13,15	6,7	20,23

3. Modeling Peak Areas of a Two-Person Mixture

For the search of a pair of best matching profiles to be feasible, we assume the peak areas of the various loci to be conditionally independent given the loci area sums, A ₊. Performing the inference conditioned on A ₊ satisfy the reasoning of Cox (1958) as A ₊ is an ancillary statistic for the mixture ratio, i.e. A ₊ is fixed for all values of the mixture ratio. Furthermore, we assume that the peak areas are multivariate normal distributed with conditional mean vector, \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $${\mathbb E} ( \textbf{\textit{A}}_s \mid A_{s , + } )$$ \end{document} , and covariance matrix, \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $${\mathbb C}{ \rm ov} ( \textbf{\textit{A}}_s \mid A_{s , + } )$$ \end{document} , defined as \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*} {\mathbb E} (\textbf {\textit {A}} _s \mid A_ {s , +}) & = [\alpha \textbf {\textit {P}} _ {s , 1} + (1 - \alpha) \textbf {\textit {P}} _ {s , 2}] \frac {A_ {s , +}} {2} \quad {\rm and} \\ {\mathbb C} {\rm ov} (\textbf {\textit {A}} _s \mid A_ {s , +}) & = \tau^2 C_s {\rm diag} (\textbf {\textit {h}} _s) C_s^ {\top}, \tag {1} \end{align*} \end{document} where α denotes the proportion with which person 1 contributes to the mixture, and \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$C_s = I_{n_s} - n_s^{ - 1}\textbf{\textit{1}}_{n_s}\textbf{\textit{1}}_{n_s}^{ \top}$$ \end{document} with n_s, 1 ≤ n_s ≤ 4, being the number of observed peaks at locus s. Note that α is supposed to be common to all loci. The definition of the covariance matrix is close to the ordinary covariance when conditioning on the vector sum. However, as the variance of the peak area is assumed proportional to the mean, we use the diagonal matrix diag( h _s), where h _s is the associated peak heights on locus s, to obtain weighted observations that stabilise the variance. Furthermore, τ² is a common variance parameter for all loci, \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$s \in { \cal S}$$ \end{document} .

The P _s,k-vector is a vector of indicators taking values 0, 1 or 2 referring to the number of copies that person k has of each allele in the mixture on locus s. For example, if the two individuals contributing to the mixture have genotypes (10,12) and (14,14), respectively, we will have P _s,1 = (1, 1, 0)^⊤ and P _s,2 = (0, 0, 2)^⊤. Assuming no chromosomal anomalies, each individual carries two alleles at each locus which implies the sum of P _s,k to be 2 for all k.

The model presented here is different from, for example, the ones of Cowell et al. (2007a,b) and Curran (2008), who both take a Bayesian approach. The model of Cowell et al. (2007a) assumes the peak heights to be gamma-distributed to ensure proportionality of the mean and variance, whereas Cowell et al. (2007b) assumes normality of the peak heights with parameters chosen to ensure proportionality of mean and variance. As mentioned in Curran (2008), the model of Cowell et al. (2007a) makes a crude adjustment for a repeat number effect, which is no longer relevant. In Curran (2008), the peak heights are assumed multivariate normal, but here no attempt is done in order to ensure proportionality of the mean and variance. Furthermore, by conditioning on the peak area sums within each locus we acknowledge the strong inter-locus correlation. In addition to the methods based on statistical models, there are several methods that rely on heuristics and guidelines (Gill et al., 1998, 2006; Clayton et al., 1998; Perlin and Szabady, 2001; Bill et al., 2005; Wang et al., 2006). Cowell et al. (2007b) gives a nice review of most of these methods in their introductory section.

4. Finding Best Matching Pair of Profiles

In order to find the most likely pair of profiles matching the observed mixture under the assumptions made by the model, one can decrease the number of possibilities using the following arguments. Let the observed peak areas within each locus, s, be sorted such that \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$A_{s , ( 1 ) } < \cdots < A_{s , ( n_s ) }$$ \end{document} , and assume that DNA₁ < DNA₂, where DNA_k is the amount of DNA contributed by person k.

Then, for a locus with four observed peaks (n_s = 4), the only likely pair of profiles given the model relate the alleles with peak areas (A_s,₍₁₎, A_s,₍₂₎) and (A_s,₍₃₎, A_s,₍₄₎) to person 1 and person 2, respectively. For loci with one observation (n_s = 1), the two individuals need both to be homozygotic for the observed allele, while for two (n_s = 2) or three observations (n_s = 3), the possible profiles are listed in Table 3 (the notation \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $${ \cal J}_2$$ \end{document} and \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $${ \cal J}_3$$ \end{document} is used in Section 4.1).

Table 3.

Possible Profiles for Loci with Two and Three Observations

\documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $${ \cal J}_2 :$$ \end{document}	P_s,1	P_s,2	P_s,1	P_s,2	P_s,1	P_s,2	P_s,1	P_s,2	\documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $${ \cal J}_3 :$$ \end{document}	P_s,1	P_s,2	P_s,1	P_s,2	P_s,1	P_s,2	P_s,1	P_s,2
A_s, ₍₁₎	1	1	2	0	1	0	0	1	A_s, ₍₁₎	2	0	1	0	1	0	0	1
A_s, ₍₂₎	1	1	0	2	1	2	2	1	A_s, ₍₂₎	0	1	1	0	0	1	0	1
									A_s, ₍₃₎	0	1	0	2	1	1	2	0

In Table 3, P _s,1 and P _s,2 refers to the profiles of person 1 and person 2 on the particular locus s, respectively, and the cell values to the number of alleles associated with the profiles. The reason for not considering the three and eight other combinations for loci with two and three observations (Table 1), respectively, is that, for any of these combinations, one of the four combinations listed in Table 3 will be more likely under the model assumptions, i.e., have a better fit to the observed data. For example, P _s,1 = (0, 2)^⊤ and P _s,2 = (2,0)^⊤ would be unlikely as we assumed person 1 to have the lowest contribution and the second area to be the larger.

Discarding the information from peak areas and only using combinatorics, the numbers of possible pairs of profiles for loci with two, three, and four observations are 7, 12, and 6, respectively. Thus, using the assumptions of the model, we decrease the number of profiles which needs to be examined in order to find the most likely profiles forming the observed mixture.

We assume the peak areas to be normally distributed with conditional means and covariances as specified in (1). Due to the conditional independence of the loci, the overall estimates of α and τ² are found as sums over the loci. Let \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$W_s = C_s{ \rm diag} ( \textbf{\textit{h}}_s ) C_s^{ \top}$$ \end{document} , then we can write the conditional distribution as \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\textbf{\textit{A}}_s \mid \textbf{\textit{A}}_{s , + } \sim {\cal N}_{n_s} ( \alpha \textbf{\textit{x}}_0^s - \textbf{\textit{x}}_1^s , \tau^2W_s )$$ \end{document} , where \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\textbf{\textit{x}}_0^s = ( \textbf{\textit{P}}_{s , 1} - \textbf{\textit{P}}_{s , 2} ) A_{s , + } / 2$$ \end{document} and \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\textbf{\textit{x}}_1^s = \textbf{\textit{P}}_{s , 2}A_{s , + } / 2$$ \end{document} are the terms of the mean, linear and constant in α, respectively. Solving the likelihood equation with respect to α and τ² yield the unbiased estimators \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*} \hat {\alpha} & = \frac {\sum_ {s \in {\cal S}} \textbf {\textit {x}} { _0^s} ^ {\top} W_s^ {-} (\textbf {\textit {A}} _s - x_1^s )} {\sum_ {s \in {\cal S}} \textbf {\textit {x}} {_0^s} ^ {\top} W_s^ {-} x_0^s} \quad {\rm and} \\ \hat {\tau} ^2 & = N^ {- 1} \sum_ {s \in {\cal S}} (\textbf {\textit {A}} _s - \hat {\alpha} \textbf {\textit {x}} _0^s - \textbf {\textit {x} } _1^s) ^ {\top} W_s^ {-} (\textbf {\textit {A}} _s - \hat {\alpha} \textbf {\textit {x}} _0^s - \textbf {\textit {x}} _1^s ), \tag {2} \end{align*} \end{document} where \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$N = n_{ + } - S - 1 = \sum \nolimits_{s \in { \cal S}} ( n_s - 1 ) - 1$$ \end{document} and \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$W_s^{ - }$$ \end{document} is the generalized inverse of W_s. We have to use the generalized inverse of W_s as W_s has the rank n_s − 1. An approximation to this model assumes that the precision matrix, \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\tau^{ - 2}W_s^{ - 1}$$ \end{document} , is given by \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\tau^{ - 2}C_s{ \rm diag} ( \textbf{\textit{h}}_s ) ^{ - 1}C_s^{ \top}$$ \end{document} . Hence, we have a closed form expression for the inverse covariance matrix yielding simple expressions for the estimators of α and τ², \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*} \tilde { \alpha } & = \frac { \sum_ { s \in { \cal S } } \sum_ { i = 1 } ^ { n_s } x_ { 0 , i } ^s ( A_ { s , i } - x_ { 1 , i } ^s ) h^ { - 1 } _ { s , i } } { \sum_ { s \in { \cal S } } \sum_ { i = 1 } ^ { n_s } { x_ { 0 , i } ^s } ^2h^ { - 1 } _ { s , i } } \quad { \rm and } \\ \tilde { \tau } ^2 & = N^ { - 1 } \sum_ { s \in { \cal S } } \sum_ { i = 1 } ^ { n_s } ( A_ { s , i } - \tilde { \alpha } x_ { 0 , i } ^s - x_ { 1 , i } ^s ) ^2h^ { - 1 } _ { s , i } , \end{align*} \end{document} where \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$A_{s , i , } h_{s , i , }x_{0 , i}^s$$ \end{document} and \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$x_{1 , i}^s$$ \end{document} are the i'th components of the respective bold faced vectors. We denote the unbiased maximum likelihood estimates for the two models as ( \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\hat{ \alpha} , \hat{ \tau}$$ \end{document} ) and ( \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\tilde{ \alpha} , \tilde{ \tau}$$ \end{document} ), respectively. The latter version is what is implemented in an on-line tool as discussed in Section 4.2.

In addition to the estimate of α, we are also interested in determining a confidence interval for α. The conditional variance of \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\hat{ \alpha}$$ \end{document} given A ₊ is found using the covariance operator on both sides of (2), \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*} {\mathbb V} {\rm ar} (\hat {\alpha} \mid \textbf {\textit {A}} _ {+} ) & = \frac {{\mathbb C} {\rm ov} \left(\sum_ {s \in {\cal S}} \textbf {\textit {x}} _0^s { } ^ {\top} W_s^ - (\textbf {\textit {A}} _s - \textbf {\textit {x}} _1^s) \big| \textbf {\textit {A}} _ {+} \right)} {\left(\sum_ {s \in { \cal S}} \textbf {\textit {x}} _0^s { } ^ {\top} W_s^ - \textbf {\textit {x } } _0^s \right)^2} \\ & = \frac { \sum_ {s \in {\cal S}} \textbf {\textit {x} } _0^s { } ^ {\top} W_s^ - { \mathbb C } { \rm ov } ( \textbf { \textit { A } } _s \mid \textbf { \textit { A } } _ { + } ) W_s^ - \textbf { \textit { x } } _0^s } { ( \sum_ { s \in { \cal S } } \textbf { \textit { x } } _0^s { } ^ { \top } W_s^ - \textbf { \textit { x } } _0^s ) ^2 } \\ & = \tau^2 \left(\sum_ { s \in { s } } \textbf { \textit { x } } _0^s { } ^ { \top } W_s^ - \textbf { \textit { x } } _0^s \right)^ { - 1 } , \tag {3} \end{align*} \end{document} where we, from the first to second equality, used the conditional independence of A _s and A _t given A ₊, and second to third properties of the covariance together with the expression of \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $${\mathbb C}{ \rm ov} ( \textbf{\textit{A}}_s \mid \textbf{\textit{A}}_{ + } )$$ \end{document} in (1). The confidence interval of α given A ₊ is then given by \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*} { \bf CI } _ { \beta } ( { \alpha } ) = \hat { \alpha } \pm t_ { 1 - \beta / 2 , N } \frac { \hat { \tau } } { \sqrt { \sum_ { s \in { \cal S } } \textbf { \textit { x } } _0^s { } ^ { \top } W_s^ - \textbf { \textit { x } } _0^s } } , \end{align*} \end{document} where t_{1 − β/2,N} is the critical value on significance level β for a t-distribution with N = n₊ − S − 1 degrees of freedom. A similar confidence interval using the ( \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\tilde{ \alpha} , \tilde{ \tau}$$ \end{document} )-estimates is obtained by inserting the ( \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\tilde{ \alpha} , \tilde{ \tau}$$ \end{document} )-estimates instead of ( \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\hat{ \alpha} , \hat{ \tau}$$ \end{document} ) and replacing W⁻ with W⁻¹. From the expression of CI_β(α), it is obvious that a small τ-estimate decreases the width of the confidence interval and thus increases the trust in the estimated mixture proportion.

4.1. Greedy algorithm

This model was used in an algorithm for finding the most likely pair of profiles contributing to an observed mixture where the STR profiles of both individuals were assumed unknown. First, define the set \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $${ \cal J} = \{ { \cal J}_1 , \ldots , { \cal J}_4 \} $$ \end{document} , where \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $${ \cal J}_i$$ \end{document} is the set of plausible profiles for loci with n_s = i. These sets were defined in Section 4 (Table 3). The pseudo code for a greedy algorithm finding a pair of profiles (locally) maximising the likelihood of the model specified by (1) is given in Figure 1. A greedy algorithm is any algorithm that solves a problem by making the locally optimum choice at each stage with the hope of finding the global optimum. A graphical representation of the algorithm is given in Figure 7 below for a general number of contributors, m. The algorithm works with both ( \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\hat{ \alpha} , \hat{ \tau}$$ \end{document} ) or ( \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\tilde{ \alpha} , \tilde{ \tau}$$ \end{document} ) as estimates of (α, τ).

The greedy algorithm initiates by estimating α based on a locus s with four present alleles. The loci of \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $${ \cal S}_4$$ \end{document} contain full information on the mixture ratio, α, and are thus used for assessing this quantity. In succession, the loci with three and two ( \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $${ \cal S}_3$$ \end{document} and \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $${ \cal S}_2$$ \end{document} , respectively) observations are analyzed and the combination with the smallest contribution to τ and best concordance to the previously determined mixture proportion is chosen. The set \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $${ \cal T}$$ \end{document} contains a list of the optimal combinations on previously visited loci and is updated after each iteration. On termination, the greedy algorithm returns the best matching pair of profiles together with the estimates of α and τ. The algorithm is designed to perform calculations and decisions similar to those of a forensic geneticist when analyzing a two-person mixture.

The optimization problem is complicated since the inputs of the function that we are interested in minimizing depend on each other, \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$f ( \alpha , ( \textbf{\textit{P}}_{s , 1} , \textbf{\textit{P}}_{s , 2} ) _{s \in { \cal S}} ) = \sum \nolimits_{s \in { \cal S}}D_s$$ \end{document} , where \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$D_s = ( \textbf{\textit{A}}_s - \alpha \textbf{\textit{x}}_0^s - \textbf{\textit{x}}_1^s ) ^{ \top}W_s^{ - } ( \textbf{\textit{A}}_s - \alpha \textbf{\textit{x}}_0^s - \textbf{\textit{x}}_1^s )$$ \end{document} . Here, f denotes the object function and \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$( \textbf{\textit{P}}_{s , 1 , } \textbf{\textit{P}}_{s , 2} ) _{s \in { \cal S}}$$ \end{document} the set of possible combinations for all loci, \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$s \in { \cal S}$$ \end{document} . It is easy to see that, for a fixed α, we can minimize D_s for each locus s by choosing the combination yielding the smallest square distance. Similarly, fixing the combinations for all loci, α is estimated using (2). However, from the construction of the greedy algorithm, the algorithm chooses the combination that minimizes τ² for locus s given α and the configurations on loci previously visited loci, \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$t \in \{ { \cal T} \setminus s \} $$ \end{document} . This ensures locally optimal solutions, and for most practical purposes, the algorithm returns a global maximum. One should note that, when the algorithm recovers the best matching pair of profiles, we still need to consider all profiles close to these profiles consistent with the evidence for likelihood ratio evaluation (for further details, see Section 5).

4.2. On-line implementation

The greedy algorithm of Figure 1 together with the methods for evaluating the goodness of fit for a given pair of profiles are implemented in an online application. The on-line implementation applies the ( \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\tilde{ \alpha} , \tilde{ \tau}$$ \end{document} )-estimates when finding the best matching pair of profiles. The two-person (and three-person) mixture separator is available online at the first author's website (http://people.math.aau.dk/∼tvede/dna/). The script can plot the expected and observed peak areas for visual inspection of the fit (Fig. 2).

The script allows for user uploads of csv-files containing information about loci, alleles, peak heights, and peak areas. The loci implemented are those contained in the SGM Plus and Identifiler kits (Applied Biosystems) excluding amelogenin.

Apart from finding the best matching pair of unknown profiles, the user can specify a suspect profile, and the script finds the best matching unknown profile for two-person mixtures.

4.2.1. Example of a two-person mixture separation

We demonstrate the algorithm and implementation on data from a controlled experiment conducted at the Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health Sciences, University of Copenhagen, Denmark. The data are presented in Table 4 together with information on the true profiles of the mixture (denoted by ∘ and •).

Table 4.

Data Used in Demonstrating the Algorithm

Locus	Allele		Height	Area	Locus	Allele		Height	Area
D3	15	∘ •	1802	15410	D21	29	•	1073	9454
D3	16	∘ •	1939	16282	D21	30	∘	1469	12828
vWA	14	∘	712	6128	D21	31	•	798	6992
vWA	15	•	725	6620	D18	13	∘	1247	12302
vWA	16	∘	626	5637	D18	15	•	899	9104
vWA	17	•	830	7362	D18	17	•	726	7549
D16	10	∘	824	7910	D19	13	•	1332	10534
D16	11	•	1772	17231	D19	14	∘	416	3478
D16	12	∘	586	6101	D19	15	∘	504	3968
D2	17	∘	434	4558	TH0	6	∘ •	820	6739
D2	19	•	612	6563	TH0	8	•	668	5573
D2	25	∘ •	843	9257	TH0	9	∘	486	4004
D8	8	•	1284	10782	FGA	19	∘	490	4415
D8	12	•	1232	10359	FGA	23	∘•	865	7968
D8	13	∘	903	7891	FGA	24	•	527	5036
D8	16	∘	638	5291

The ∘ and • represents profile 1 and 2, respectively.

The algorithm found that the two profiles of Table 5 are the best matching pair of profiles. The profiles are consistent with the true profiles of the mixture except for loci TH0 and FGA. In Figure 2, we have plotted the data from Table 4 (solid cones, ▴) together with the best matching pair of profiles as listed in Table 5.

Table 5.

Best Matching Pair of Profiles for the Data in Table 4

Locus	D3	vWA	D16	D2	D8	D21	D18	D19	TH0	FGA
Minor	15,16	14,16	10,12	17,25	13,16	30,30	13,13	14,15	6,6	23,23
Major	15,16	15,17	11,11	19,25	8,12	29,31	15,17	13,13	8,9	19,24

This pair of profiles is pictured in Figure 2 as the expected peaks.

In Figure 3, the traces of the parameter estimates of α (dashed) and τ² (solid) are plotted for each successive iteration, with the final parameter estimates being \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\tilde{ \alpha} = 0.43$$ \end{document} (95%-CI: [0.40; 0.45]) and \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\tilde{ \tau}^2 = 1134.04$$ \end{document} . Evaluating the mixture of the true profiles (marked by ∘ and • in Table 4), the α estimate is almost unchanged ( \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\tilde{ \alpha} = 0.42$$ \end{document} ), but with an increase in \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\tilde{ \tau}^2$$ \end{document} to 1266.34 indicating a slightly worse fit.

FIG. 3.

Trace of the parameter estimates of α (dashed/right ordinate labels) and τ² (solid/left ordinate labels). The plot is produced by the on-line tool available at http://people.math.aau.dk/∼tvede/dna/.

The fact that a combination different from the true one has a better fit indicates that there are multiple explanations of the trace since it is a 1:1-mixture (α close to 0.5). However, the difference in τ²-estimates for the two combinations will only have a minor influence in the evaluation of the evidence.

4.3. Dropping non-fitting loci

In some cases, the stain may be contaminated, and it may be subject to drop-in or drop-out. Drop-ins are allelic peaks present in the DNA profile not belonging to the true profiles. Drop-ins may occur at random (contamination) or by more systematic mechanisms such as stuttering or pull-up effects. Stutters are caused by artefacts in the polymerase chain reaction resulting in an increase of peak intensities typically in the allelic position before the true peaks. Pull-up effects are manifested as an increase of true peaks caused by overlap of the spectra of the light emitted from the various fluorochromes, which are detected by a CCD camera in the data generating process (Butler, 2005). Drop-outs are allelic peaks of the true profiles that are absent in the DNA profile due to, for example, low amount of DNA or degradation of the DNA. In such cases, the observed peak heights and peak areas no longer originates solely from a two-person mixture. Hence, the proportionalities of Section 1 need no longer to be satisfied, and the mean structure of (1) may not explain the observed peak heights and peak areas in all loci.

We use an F-test approach to evaluate whether any of the included loci \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$s \in { \cal S}$$ \end{document} has significant unexpected balances due to, for example, stutters, degradation, or contamination. The purpose is to return a list of loci in which the hypothesis of a two-person mixture can be supported.

For each locus, the contribution to τ² is computed by D_s, which we assume to follow a \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\chi_{n_s - 1}^2$$ \end{document} -distribution. Hence, to test whether any locus contributes significantly to the overall variance, τ², we evaluate for each locus \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$s \in { \cal S}$$ \end{document} the ratio \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*} \frac { ( n_s - 1 ) ^ { - 1 } D_s } { ( n_ { + } - S - n_s - 1 ) ^ { - 1 } \sum_ { t \in \ { { \cal S } \setminus s \ } } D_t } \sim F_ { ( n_s - 1 ) , ( n_ + - S - n_s - 1 ) } , \end{align*} \end{document} where F_(ν1),(ν2) is an F-distribution with ν₁ numerator and ν₂ denominator degrees of freedom. Since we perform this test for all loci, we make a Bonferroni-correction to compensate for multiple testing. We apply this procedure successively and drop the most significant locus (if any) until no locus has a significant test-value. This facility is also available in the on-line implementation.

If the variance contribution from multiple loci is large, the test-value will not indicate any significant locus as the overall noise of the sample is large or may be a mixture of more than two individuals. This will result in large values for the overall τ².

5. Likelihood Ratio

If a case includes a victim with profile G_V, the set \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $${ \cal C}_p = \{ ( G_S , G_V ) \equiv { \cal G} \} $$ \end{document} only contain one element, (G_S, G_V ). Hence, the likelihood ratio simplifies further \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*} LR = \frac { L ( \textbf { \textit { A } } \mid G_S , G_V ) } { \sum \limits_ { G_U \in { \cal C } _d } L ( \textbf { \textit { A } } \mid G_V , G_U ) P ( G_U ) } , \end{align*} \end{document} where for this simpler case \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $${ \cal C}_d = \{ G_U : ( G_V , G_U ) \equiv { \cal G} \} $$ \end{document} .

In some cases, the value of L( A |G_S, G_V ) may be very much lower than the likelihood value for the pair of best matching profiles. This indicates that it is inappropriate to assume that the evidence is a mixture of G_S and G_V - even though the profiles (G_S, G_V) are consistent with \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $${ \cal G}$$ \end{document} .

The sums involved in the evaluation of the likelihood ratio will often involve an intractable number of terms depending on the number of loci and number of observed peaks in each locus. As the inclusion of all possible combinations is infeasible, we need at least to include combinations with a numerical impact on the likelihood ratio for the approximation of the true likelihood ratio to be satisfactory for forensic use.

The best matching pair of profiles will provide an estimate of the mixture proportion α. The expected peak areas in Table 6 (expressed in terms of α) indicate that alternative combinations need to have an α-estimate close to the estimate of the best matching pair in order to have a reasonable fit. We exploit this result when defining our proposal distribution in the section on importance sampling.

Table 6.

Expected Peak Areas for a Two-Person Mixture (Expressed in Terms of α)

n_s	Observed alleles	Combinations	Expected peak areas in terms of α
1	a ⁴	(aa, aa)	(2) × A_s,₊/2
2	a ³ b	(aa, ab)	(1 + α,1 − α) × A_s,₊/2
		(ab, aa)	(2 − α, α) × A_s,₊/2
	a ² b ²	(aa, bb)	(2α, 2(1 − α)) × A_s,₊/2
		(ab, ab)	(1, 1) × A_s,₊/2
3	a ² bc	(aa, bc)	(2α, 1 − α, 1 − α) × A_s,₊/2
		(ab, ac)	(1, α,1 − α) × A_s,₊/2
		(bc, aa)	(2(1 − α), α, α) × A_s,₊/2
4	abcd	(ab, cd)	(α, α, 1 − α, 1 − α) × A_s,₊/2
		(ac, bd)	(α,1 − α, α, 1 − α) × A_s,₊/2

The list is minimal such that equivalent combinations up to numeration of alleles are avoided. The expected peak areas are ordered by lexicographic order of the allele designation.

6. Importance Sampling of the Likelihood Ratio

An exact assessment of the weight of evidence comprises evaluation of every term of the numerator and denominator of (4). However, this is infeasible and other methods of evaluating the evidence need to be considered. In this section, we show how importance sampling can be used, for estimation of the weight of evidence by assigning weights to the individual combinations.

Let \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$q ( \textbf{\textit{G}} ) = \prod \nolimits_{s \in { \cal S}}q_s ( \textbf{\textit{G}}_s )$$ \end{document} , where \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\textbf{\textit{G}}_s = ( G_s^{ \prime} , G_s^{ \prime \prime} )$$ \end{document} is the profiles on locus s and \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*} q_s ( \textbf { \textit { G } } _s ) = \frac { L ( \textbf { \textit { A } } \mid \textbf { \textit { G } } _ { s , } \hat { \textbf { \textit { G } } } _ { - s } ) P ( \textbf { \textit { G } } _s ) } { \sum_ { i = 1 } ^ { N_s } L ( \textbf { \textit { A } } \mid \textbf { \textit { G } } _ { s , i , } \hat { \textbf { \textit { G } } } _ { - s } ) P ( \textbf { \textit { G } } _ { s , i } ) , } \tag { 6 } \end{align*} \end{document} where N_s is the number of combinations for the observed number of alleles, \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$( \textbf{\textit{G}}_{s , } \hat{\textbf{\textit{G}}}_{ - s} )$$ \end{document} is the particular combination on locus s merged with the best matching combination, \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\hat{\textbf{\textit{G}}}$$ \end{document} , in the remaining loci, \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$t \in \{ { \cal S} \setminus s \} $$ \end{document} , and the sum in the denominator is over all possible combinations, N_s, in locus s merge with the best matching combination in the remaining loci. Hence, \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$L ( \textbf{\textit{A}} \mid \textbf{\textit{G}}_{s , } \hat{\textbf{\textit{G}}}_{ - s} )$$ \end{document} is called the “marginal” likelihood as it gives the likelihood for the particular combination on locus s with the combinations on the remaining loci identical to the best matching pair of profiles. Furthermore, the denominator of (6) is a constant, B_s, for each locus. Using this proposal distribution, \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$P ( { \cal E} \mid H_d )$$ \end{document} may be expressed as an expectation with respect to q, \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*} P ( { \cal E } \mid H_d ) = \sum_ { \textbf { \textit { G } } \in { \cal C } _d } L ( \textbf { \textit { A } } \mid \textbf { \textit { G } } ) \frac { P ( \textbf { \textit { G } } ) } { q ( \textbf { \textit { G } } ) } q ( \textbf { \textit { G } } ) = { \mathbb E } ( h ( { \cal E } ) W ( { \cal E } ) ;q ) , \end{align*} \end{document} where \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$W ( { \cal E} ) = P ( \textbf{\textit{G}} / q ( \textbf{\textit{G}} )$$ \end{document} is the importance weight. Since \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$P ( \textbf{\textit{G}} ) = \prod_{s \in { \cal S}} P ( \textbf{\textit{G}}_s )$$ \end{document} and \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$B = \prod \nolimits_{s \in { \cal S}}B_s$$ \end{document} , the ratio of L( A | G )P( G )/q( G ) is nearly constant: \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*} \frac { \frac { L ( \textbf { \textit { A } } \mid \textbf { \textit { G } } ) P ( \textbf { \textit { G } } ) } { \prod_ { s \in { \cal S } } \ { L ( \textbf { \textit { A } } \mid \textbf { \textit { G } } _ { s , } \hat { \textbf { \textit { G } } } _ { - s } ) P ( \textbf { \textit { G } } _s ) \ } } } { \prod_ { s \in { \cal S } } B_s } = \frac { L ( \textbf { \textit { A } } \mid \textbf { \textit { G } } ) B } { \prod \limits_ { s \in { \cal S } } L ( \textbf { \textit { A } } \mid \textbf { \textit { G } } _ { s , } \hat { \textbf { \textit { G } } } _ { - s } ) } , \end{align*} \end{document} where the product in the denominator in many cases is a good approximation to L( A | G ). This constantness of \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$h ( { \cal E} ) W ( { \cal E} )$$ \end{document} improves the performance of importance sampling and reduces the number of samples needed for results with low variance (Robert and Casella, 2004).

In order to estimate \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$P ( { \cal E} \mid H_d )$$ \end{document} , we draw combinations \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\textbf{\textit{G}}_i , i = 1 , \ldots , M$$ \end{document} , from q( G ) and compute the Monte Carlo estimate, \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*} \hat { P } ( { \cal E } \mid H_d ) = \frac { 1 } { M } \sum_ { i = 1 } ^M L ( \textbf { \textit { A } } \mid \textbf { \textit { G } } _i ) W ( \textbf { \textit { G } } _i ) , \quad \textbf { \textit { G } } _i \sim q ( \textbf { \textit { G } } ) , \end{align*} \end{document} where W( G _i) = P( G _i)/q( G _i) are the importance weights.

The estimate is unbiased as the terms are independently simulated from q( G ) and all have expectation \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $${\mathbb E} ( h ( { \cal E} ) W ( { \cal E} ) ; q ) = P ( { \cal E} \mid H_d )$$ \end{document} . For the variance of \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\hat{P} ( { \cal E} \mid H_d )$$ \end{document} , we compute \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*} { \mathbb V } { \rm ar } ( \hat { P } ( { \cal E } \mid H_d ) ) = \frac { 1 } { M - 1 } \sum_ { i = 1 } ^M \left\{ [ L ( \textbf { \textit { A } } \mid \textbf { \textit { G } } _i ) w ( \textbf { \textit { G } } _i ) ] ^2 - \hat { P } ( { \cal E } \mid H_d ) ^2 \right\} . \end{align*} \end{document} The numerator of \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$LR , P ( { \cal E} \mid H_d )$$ \end{document} , can be handled similarly taking into consideration that we are summing over a restricted set of combinations, \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $${ \cal C}_p$$ \end{document} , all including the suspect's profile, \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $${ \cal C}_p = \{ G_U : ( G_S , G_U ) \equiv { \cal G} \} $$ \end{document} . The greedy algorithm of Figure 1 is also applicable when specifying a suspect. We only need another ordering of the observations and a different set of \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $${ \cal J}$$ \end{document} -matrices using the extra information of the suspect's profile. This implies that there exists a best matching combination, \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\hat{\textbf{\textit{G}}}^{ ( S ) }$$ \end{document} , in \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $${ \cal C}_p$$ \end{document} having the same properties as \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\hat{\textbf{\textit{G}}}$$ \end{document} for the unrestricted set, \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $${ \cal C}_d$$ \end{document} . Hence, importance sampling may also be used in estimating \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$P ( { \cal E} \mid H_p )$$ \end{document} with similar formulae as those for estimating \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$P ( { \cal E} \mid H_d )$$ \end{document} .

6.1. Example of estimating LR using importance sampling

The best matching pair of profiles for the data in Table 4 was found in Table 5, and was used for estimating q( G ) and the constant B. In the computations, we assumed uniform distributions of the allele probabilities. Table 7 lists the profile of a fictive suspect, G_S, together with the unknown profile maximizing the likelihood with G_S fixed. This pair plays the role of \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\hat{\textbf{\textit{G}}}^{ ( S ) }$$ \end{document} in this example. In Figure 4, the observed, ▴, and expected peak heights, ▵, assuming a mixture of these profiles are plotted.

FIG. 4.

Plot of the observed peaks, ▴, and the expected peaks, ▵, assuming a mixture of the suspect and best matching unknown (STR profiles of Table 7).

Table 7.

Suspect's STR Profile Together with Best Matching STR Profile of an Unknown Person

Locus	D3	vWA	D16	D2	D8	D21	D18	D19	TH0	FGA
Suspect	16,16	15,17	11,11	25,25	13,16	30,30	15,17	15,15	8,9	24,24
Unknown	15,15	14,16	10,12	17,19	8,12	29,31	13,13	13,14	6,6	19,23

In order to verify the validity of our methodology and implementation of the importance sampler, we limited our data to include only loci on the blue fluorescent dye band (D3, vWA, D16, and D2). The total number of possible combinations for the blue loci is 7¹12²6¹ = 6, 048, and it is therefore possible to compute the correct value of \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$P ( { \cal E} \mid H_d ) = 0.481335 \times 10^{ - 10}$$ \end{document} . For the suspect's profile specified in Table 7, locus D3 is the only blue locus for which it is possible to alter the unknown profile and still have consistency with \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $${ \cal G}$$ \end{document} . Hence, there are only two terms in the \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$P ( { \cal E} \mid H_d )$$ \end{document} when restricting the analysis to the blue dye band.

In order to evaluate the performance of the importance sampler, we computed 1,000 estimates of \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$P ( { \cal E} \mid H_d )$$ \end{document} each based on 10,000 samples. The estimates are plotted together with the correct value in the histogram of Figure 5. The distribution of the estimates tends to be skew for this particular example, but with most of the estimates close to the true value of \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$P ( { \cal E} \mid H_d )$$ \end{document} .

FIG. 5.

Histogram of 1,000 estimates of \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$P ( { \cal E} \mid H_d )$$ \end{document} each based on 10,000 samples.

7. Results

The algorithm was tested on data from 71 controlled two-person mixtures with known profiles. Hence, it was possible to validate the suggested profiles returned by the separation algorithm. Table 8 summarizes the comparisons with the best matching pair of profiles and the true mixture profiles.

Table 8.

Detailed Summary Table with the Number of Correctly Separated Loci, x, Stratified by Mixture Ratio

	Cases with both profiles correct in x of 10 loci								Cases with major profile correct in x of 10 loci								Cases with minor profile correct in x of 10 loci
Ratio	3	4	5	6	7	8	9	10	3	4	5	6	7	8	9	10	3	4	5	6	7	8	9	10
1:1	1	3	0	2	2	2	1	0	1	2	1	2	2	2	1	0	1	3	0	2	2	2	1	0
1:2	0	0	0	0	2	5	7	8	0	0	0	0	0	2	8	12	0	0	0	0	2	4	8	8
1:4	0	0	0	1	3	2	6	7	0	0	0	0	0	0	0	19	0	0	0	1	3	2	6	7
1:8	0	0	0	2	5	4	0	4	0	0	0	0	0	0	0	15	0	0	0	2	5	4	0	4
1:16	0	0	1	0	0	0	0	3	0	0	0	0	0	0	0	4	0	0	1	0	0	0	0	3
Total	1	3	1	5	12	13	14	22	1	2	1	2	2	4	9	50	1	3	1	5	12	12	15	22

From the bottom row of Table 8, we see that the separation algorithm returned the true mixture profiles as the best matching combination 22 times. The number of cases where one (14 cases), two (13 cases), or three (12 cases) loci were wrongly separated were almost the same. In five cases, half or less of the loci were correctly separated.

In 50 cases, the true major profile was correctly identified, and in another 13, there was inconsistency in at most two loci between the major profile of the best matching pair and the true major profile. Furthermore, Table 8 shows that the eight remaining cases with incorrect identification of the major profile had mixture ratio 1:1. Hence, in these cases, there were no obvious major profiles as the amounts of DNA contributed were (almost) equal. Furthermore, for 1:1-mixtures, there are many pairs of profiles yielding similar goodness of fit to the observed peak intensities, which previously was exemplified in Section 4.2.1. The algorithm is less successful in identification of the minor profile. However, in most cases, the minor profile was separated correctly in seven or more loci.

In addition to the 71 DNA mixtures from controlled experiments, the separation algorithm was used to separate 64 two-person DNA mixtures from real crime cases. For each of the 64 crime cases the laboratory had two reference samples that were consistent with the observed stain. Three experienced forensic geneticists tried to identify both the major and minor profiles of the mixture without knowing the true profiles of the mixture for each mixture (blinded experiment). In Table 9, the results from the separation using the separation algorithm is compared to those of the forensic geneticists.

Table 9.

Comparison of the Performance of the Separation Algorithm and Forensic Geneticists

	Geneticists		Algorithm
Correct loci	Minor	Major	Minor	Major
10	8	31	16	36
9	16	8	16	9
8	13	8	14	5
7	13	4	10	6
6	6	7	1	2
5	3	2	3	2
4	2	2	2	2
3	2	1	2	2
2	0	0	0	0
1	1	1	0	0

The counts show the number of loci with the minor and major profiles correctly identified.

The total number of correctly separated mixtures was 16 for the separation algorithm and 8 for the forensic geneticists. The samples where the minor contributor were correctly identified in all loci also had the major component correct (Table 9). As for the controlled experiments, the success rate was dependent by the mixture ratio, with number of correctly separated loci decreasing as α increased towards 0.5.

Furthermore, it should be noted that the forensic geneticists were forced to call some pairs of profiles resulting in some inconclusive statements. That is, the forensic geneticists were forced to deduce major and minor profiles in cases where the regular protocol of the laboratory would not support the separation of profiles.

8. Discussion

Using the quantitative information from STR DNA analysis in terms of peak intensities is presently the only way to separate STR mixture results. Based on a statistical model, we developed a simple greedy algorithm for finding the best matching pair of profiles.

Our model is based on a few assumptions that are widely accepted among forensic geneticists. The statistical model made it possible to make objective comparisons of various combinations by evaluating the likelihood values. From the normal distribution assumption, this value is computed by τ^−N, which implies that the lower τ estimate, the better concordance between observed and expected peaks.

Importance sampling was used in order to estimate the likelihood ratio since this becomes computationally difficult when \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$7^{S_2}12^{S_3}6^{S_4}$$ \end{document} terms need to be evaluated in the numerator of the LR with \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$H_d : ( G_{U_1} , G_{U_2} )$$ \end{document} . The method showed to be efficient, and future work will consist of implementation of sampling schemes that explore more of the sample space.

9. Conclusion

By using the greedy algorithm of Section 4.1, we demonstrated that it is possible to automate the separation of DNA mixtures. However, due to the assumption of no occurrence of drop-out or stutters, the model may be too simple for more complicated cases. Hence, this methodology is applicable to cases where the analysis today is standard but time-consuming. This allows the forensic geneticists to focus on more complex crime cases.

Future work comprises the development of a methodology for handling drop-outs and stutters. Since stutters are profile independent (stutters from parental peaks are constant for all alleged combinations of profiles), it is possible to remove stutters from the data prior to separation and interpretation. Allowing for drop-outs while finding a best matching pair of profiles is also possible. Using the methodology of Tvedebrink et al. (2009), the probability of drop-out, \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$P ( D \mid \hat{H} )$$ \end{document} , is assessed conditioned on a given profile.

10. Appendix

A. The general case with m contributors

In the general case with m contributors to the mixed stain, our method can be generalized by assuming the mixture proportions \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\alpha_1 , \ldots , \alpha_m$$ \end{document} to be strictly increasing, \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*} \alpha_1 < \cdots < \alpha_{m - 1} < \alpha_m = 1 - \alpha_{ + } , \alpha_{ + } = \sum_{i = 1}^{m - 1} \alpha_i. \tag{7} \end{align*} \end{document}

The conditional covariance structure is the same as specified in (1), where the conditional mean is: \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*} { \mathbb E } ( \textbf { \textit { A } } _s \mid A_ { s , + } ) & = \frac { A_ { s , + } } { 2 } \left[ \sum_ { i = 1 } ^ { m - 1 } \alpha_i\textbf { \textit { P } } _ { s , i } + \textbf { \textit { P } } _ { s , m } \left( 1 - \sum_ { i = 1 } ^ { m - 1 } \alpha_i \right) \right] \\ & = X_ { s , m } + \sum_ { i = 1 } ^ { m - 1 } \alpha_i X_ { s , i , } \tag { 8 } \end{align*} \end{document} where \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$X_{s , i} = ( \textbf{\textit{P}}_{s , i} - \textbf{\textit{P}}_{s , m} ) A_{s , + } / 2$$ \end{document} for \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$i = 1 , \ldots , m - 1$$ \end{document} and X_s,m = P _s,mA_s,+/2. In order to find the MLE of \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\alpha = ( \alpha_i ) _{i = 1}^{m - 1}$$ \end{document} , we solve the likelihood equations for \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\ell {\alpha}, \tau^2; (\textbf{\textit{A}}_s ) _{s \in {\cal S}})$$ \end{document} with respect to α . This implies that the MLE of α is: \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*} \hat{ \alpha} = \left( \sum_{s \in { \cal S}} X_s^{ \top}W_s^{ - }X_s \right) ^{ - 1} \left( \sum_{s \in { \cal S}} X_s^{ \top}W_s^{ - } ( \textbf{\textit{A}}_s - X_{s , m} ) \right) . \end{align*} \end{document}

Furthermore, the estimate of τ² in the general setting is \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} \begin{align*} \hat{ \tau}^2 = N^{ - 1} \sum_{s \in { \cal S}} ( \textbf{\textit{A}}_s - X_s \hat{ \alpha} - X_{s , m} ) ^{ \top}W_s^{ - } ( \textbf{\textit{A}}_s - X_s \hat{ \alpha} - X_{s , m} ) , \end{align*} \end{document} where \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$N = n_ + - S - m + 1 = \sum \nolimits_{s \in { \cal S}} ( n_s - 1 ) - ( m - 1 )$$ \end{document} .

A.1. Greedy algorithm

The greedy algorithm of Figure 1 needs only a few modifications to be applicable to the general case. Most important is the specification of the number of contributors, m. This needs to be decided before running the algorithm. For the algorithm to be successful, there should preferably be at least one locus with 2m peaks as this increases the confidence in the estimate \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\hat{alpha}$$ \end{document} .

Furthermore, it is necessary to check if the \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\hat{alpha}$$ \end{document} -estimate satisfies the inequalities of (7) for each combination. In Table 10, we list fictive data together with two combinations, both implying a perfect fit. Both matrices are valid as the orders in the α-sum columns satisfy the condition (7). However, for Combination 1, the estimate \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$\hat{alpha}_1 = ( 0.2 , 0.45 )$$ \end{document} does not satisfies (7) while the estimate for Combination \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$2 \hat{alpha}_2 = ( 0.2 , 0.35 )$$ \end{document} does. Hence, Combination 2 is chosen over Combination 1.

Table 10.

Fictive Data Showing the Importance of Ensuring (7) is Satisfied

	Combination 1				Combination 2
Area	P₁	P₂	P₃	α-sum	P₁	P₂	P₃	α-sum
200	1	0	0	α ₁	1	0	0	α ₁
450	0	1	0	α ₂	0	0	1	1 − α₊
550	1	0	1	1 − α₂	1	1	0	α₁ + α₂
800	0	1	1	1 − α₁	0	1	1	1 − α₁

In Figure 7, the greedy algorithm of Figure 6 is described by a diagram emphasizing the various steps in the procedure of finding the best matching combination.

FIG. 6.

Greedy algorithm for finding a set of profiles (locally) maximizing the likelihood of (8).

FIG. 7.

Diagram describing the greedy algorithm for resolving DNA mixtures. The shaded boxes show the loci previously visited by the algorithm. The bold lined box shows the current locus under investigation.

In step A, the parameters α and τ are estimated using only the loci with 2m observed peaks. Step B determines the profile combination (see step D) on the current locus that minimizes τ given the combinations on the already visited loci. The algorithm visits the blocks of loci with equal numbers of observed alleles in decreasing order: \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland, xspace} \usepackage{amsmath, amsxtra} \pagestyle{empty} \DeclareMathSizes {10} {9} {7} {6} \begin{document} $$2m - 1, \ldots, 2$$ \end{document} . If any of the blocks is empty, the algorithms skips forward to the next nonempty block. The order within each block of loci with 2m − i observed peaks is arbitrary. When reaching the last locus, the combination and estimates of α and τ are saved.

In step C, the algorithm visits each locus searching for a combination that might decrease τ with all remaining loci combinations fixed. If τ is non-changed, the algorithm stops. Otherwise, step C is looped until a fixed τ-value is obtained. On termination, the algorithm returns the combination and estimates of α and τ.

Step D pictures that, for each locus with less than 2m peaks, there are several combinations of profiles that need to be investigated. In the figure, each ∘ depicts a combination and • symbolises the current optimal configuration. The black arrow shows which combination is currently tested. When all the combinations are tested the one with smallest τ is returned.

Footnotes

Acknowledgments

We thank Dr. Jakob Larsen and Dr. Frederik Torp Petersen (both from the Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health Sciences, University of Copenhagen, Denmark) for their assistance in manually analyzing the DNA mixtures from real crime cases.

Disclosure Statement

No competing financial interests exist.

References

Bill

, Gill

P.D.

, Curran

et al. 2005. PENDULUM—a guideline-based approach to the interpretation of STR mixtures. Forensic Sci. Int., 148:181–189.

Buckleton

J.S.

, Triggs

C.M.

, Walsh

S.J.

2005. Mixtures, 217–274. Forensic DNA Evidence Interpretation. CRC Press: Boca Raton, FL.

Butler

J.M.

2005. Forensic DNA Typing: Biology, Technology, and Genetics of STR Markers, 2nd. Elsevier Academic Press: Burlington, MA.

Clayton

T.M.

, Whitaker

J.P.

, Sparkes

et al. 1998. Analysis and interpretation of mixed forensic stains using DNA STR profiling. Forensic Sci. Int., 91:55–70.

Cowell

R.G.

, Lauritzen

S.L.

, Mortera

2007a. A gamma model for DNA mixture analyses. Bayesian Anal., 2:333–348.

Cowell

R.G.

, Lauritzen

S.L.

, Mortera

2007b. Identification and separation of DNA mixtures using peak area information. Forensic Sci. Int., 166:28–34.

Cox

D.R.

1958. Some problems connected with statistical inference. Ann. Math. Stat., 29:357–372.

Curran

2008. A MCMC method for resolving two person mixtures. Sci. Justice, 48:168–177.

Evett

, Weir

1998. Interpreting DNA Evidence: Statistical Genetics for Forensic Scientists. Sinauer Associates: Sunderland, MA.

10.

Evett

I.W.

, Gill

P.D.

, Lambert

J.A.

1998. Taking account of peak areas when interpreting mixed DNA profiles. J. Forensic Sci., 43:62–69.

11.

Gill

P.D.

, Brenner

C.H.

, Buckleton

J.S.

et al. 2006. DNA commission of the international society of forensic genetics: recommendations on the interpretation of mixtures. Forensic Sci. Int., 160:90–101.

12.

Gill

P.D.

, Sparkes

, Pinchin

et al. 1998. Interpreting simple STR mixtures using allele peak areas. Forensic Sci. Int., 91:41–53.

13.

Nichols

R.A.

, Balding

D.J.

1991. Effects of population structure on DNA fingerprint analysis in forensic science. Heredity, 66:297–302.

14.

Perlin

M.W.

, Szabady

2001. Linear mixture analysis: a mathematical approach to resolving mixed DNA samples. J. Forensic Sci., 46:1372–1378.

15.

Robert

C.P.

, Casella

. 2004. Monte Carlo Statistical Methods, 2nd. Springer: New York.

16.

Tvedebrink

, Eriksen

P.S.

, Mogensen

H.S.

et al. 2009. Estimating the probability of allelic drop-out of STR alleles in forensic genetics. Forensic Sci. Int., 3:222–226.

17.

Tvedebrink

, Eriksen

P.S.

, Mogensen

H.S.

et al. 2010. Evaluating the weight of evidence by using quantitative short tandem repeat data in DNA mixtures. J. R. Stat. Soc. Ser. C. Appl. Stat., 59:855–874.

18.

Wang

, Xue

, Birdwell

J.D.

2006. Least-square deconvolution: a framework for interpreting short tandem repeat mixtures. J. Forensic Sci., 51:1284–1297.