Least-Square Deconvolution: A Framework for Interpreting Short Tandem Repeat Mixtures
Interpreting mixture short tandem repeat DNA data is often a laborious process, involving trying different genotype combinations mixed at assumed DNA mass proportions, and assessing whether the resultant is supported well by the relative peak-height information of the mixture sample. If a clear pattern of major–minor alleles is apparent, it is feasible to identify the major alleles of each locus and form a composite genotype profile for the major contributor. When alleles are shared between the two contributors, and/or heterozygous peak imbalance is present, it becomes complex and difficult to deduce the profile of the minor contributor. The manual trial and error procedures performed by an analyst in the attempt to resolve mixture samples have been formalized in the least-square deconvolution (LSD) framework reported here for two-person mixtures, with the allele peak height (or area) information as its only input. LSD operates on the peak-data information of each locus separately, independent of all other loci, and finds the best-fit DNA mass proportions and calculates error residual for each possible genotype combination. The LSD mathematical result for all loci is then to be reviewed by a DNA analyst, who will apply a set of heuristic interpretation guidelines in an attempt to form a composite DNA profile for each of the two contributors. Both simulated and forensic peak-height data were used to support this approach. A set of heuristic guidelines is to be used in forming a composite profile for each of the mixture contributors in analyzing the mathematical results of LSD. The heuristic rules involve the checking of consistency of the best-fit mass proportion ratios for the top-ranked genotype combination case among all four- and three-allele loci, and involve assessing the degree of fit of the top-ranked case relative to the fit of the second-ranked case. A different set of guidelines is used in reviewing and analyzing the LSD mathematical results for two-allele loci. Resolution of two-allele loci is performed with less confidence than for four- and three-allele loci. This paper gives a detailed description of the theory of the LSD methodology, discusses its limitations, and the heuristic guidelines in analyzing the LSD mathematical results. A 13-loci sample case study is included. The use of the interpretation guidelines in forming composite profiles for each of the two contributors is illustrated. Application of LSD in this case produced correct resolutions at all loci. Information on obtaining access to the LSD software is also given in the paper.
Document Type: Research Article
Affiliations: 1: Department of Chemical Engineering and Laboratory for Information Technologies, The University of Tennessee, Knoxville, TN 37996-2200. 2: Department of Electrical and Computer Engineering & Laboratory for Information Technologies, The University of Tennessee, Knoxville, TN 37996-2100.
Publication date: 2006-11-01