Article

The Shadow of Truth: The Invisible Risks of Probability in the Courtroom

Nov 15, 2024 | 20 min | anthropology

Language

EN DE

Probability statistics casting an invisible shadow over a courtroom verdict

On the probability that what a court accepts as certainty is not certainty, why the gap between 99.9% and 100% has already imprisoned innocent people, and what it means for a forensic expert to say the uncomfortable thing out loud in a room that was built for comfortable conclusions

The video lasted 43 seconds. Projected onto the courtroom screen, it showed not so much a face as the idea of one: a compression artifact shaped like a human head, smeared into something approximating a physiognomy by a surveillance camera installed sometime around the turn of the millennium that had accumulated enough optical degradation in the years since to transform every distinctive feature into an impressionistic suggestion of itself. The room held the particular quality of silence that forms in courtrooms when an expert is about to speak, a silence containing expectation and, underneath the expectation, something closer to anxiety, because the people in this room understand that what I say in the next several minutes may determine what another person’s life looks like for the decade that follows. To my right sat 2 court-appointed defense attorneys in suits that had seen better decades, watching me with the patience of professionals who have learned not to trust the morning. On my lap rested a case file I had deliberately not opened before entering the courtroom, not out of theater but out of methodology, and the methodology is the real subject of this article in a way that the 43-second video is not.

I want to be explicit from the beginning about what follows. This is not an argument that forensic identification produces unreliable results. It is an argument that the way courts receive, interpret, and act upon probabilistic forensic conclusions is so systematically misconceived that the errors it generates are not anomalies but logical consequences, predictable outcomes of a misunderstanding about the nature of probability that has been described in scientific literature since 1987, reproduced in study after study across multiple continents and multiple forensic disciplines, and absorbed by courts to approximately the degree that water is absorbed by polished marble.

What a Number Means When a Human Being Is Attached to It

I have never told a court that the person in a piece of footage is identified with 99.9% certainty. I have also never used phrases like “essentially certain” or “virtually certain” or “highly probable” without specifying exactly what classification system those words derive from and exactly what evidentiary conditions they presuppose, because these phrases, while they sound like calibrated scientific statements, function in a courtroom as something closer to verdicts delivered in the subjunctive mood as a formal concession to the requirements of due process, while the judge has already moved on internally to the question of what happens next. The precision implied by the language is real. The understanding of what that precision actually means is, in most courtrooms I have worked in, almost entirely absent.

The distinction that governs this entire field is the one identified by Thompson and Schumann (1987, “Interpretation of Statistical Evidence in Criminal Trials: The Prosecutor’s Fallacy and the Defense Attorney’s Fallacy,” Law and Human Behavior, 11[3], 167–187) in what became one of the most consistently cited and most thoroughly ignored papers in the history of forensic jurisprudence. Their observation, stated in its simplest form, is that courts systematically confuse the probability of evidence given innocence with the probability of innocence given the evidence, and that these 2 quantities are not interchangeable except under a narrow range of conditions that almost never describe a real criminal case. The error has a name. The prosecutor’s fallacy is what happens when the probability that a piece of forensic evidence would match the perpetrator’s trace if the defendant were innocent is presented in court as though it were equivalent to the probability that the defendant is innocent. These 2 quantities can differ by several orders of magnitude depending on the prior probability of guilt, and the prior probability of guilt is almost never addressed in the expert testimony that courts receive.

The experimental evidence documenting this confusion is not thin. Thompson and Schumann tested subjects with written descriptions of forensic evidence and found that the way probability was framed, as a conditional probability versus as a population frequency, produced substantially different and systematically biased conclusions in exactly the direction the prosecutor’s fallacy predicts. Subsequent research replicated this finding with legal professionals, with simulated jurors, and with sitting judges, across DNA evidence, fingerprint evidence, and eyewitness identification, across multiple national legal systems. The error is not a failure of intelligence. It is a failure of the human cognitive system to correctly perform conditional probability reasoning without explicit training, which is to say it is a failure that describes nearly everyone who has not specifically studied Bayesian inference, which is to say nearly everyone in nearly every courtroom.

The Arithmetic of Scale and the Calculation Courts Never Perform

Frankfurt Airport processed approximately 61.6 million passengers in 2023 (Fraport AG, 2024, Annual Report 2023, Fraport AG), which translates to roughly 168,000 passengers per day. Historical data on serious passenger-related incidents at major international airports suggests a rate of approximately 1 incident per 100,000 passenger movements. Apply this rate to Frankfurt’s daily volume and the mathematics produces an expected incident frequency of 0.168 per day, statistically 1 serious incident every 6 days, not because Frankfurt Airport is unsafe by any reasonable measure but because any nonzero probability, multiplied by a sufficiently large number of trials, produces events. The airport is not broken. The mathematics is working exactly as it should. The question is whether the people responsible for operational decisions at the airport have understood what those mathematics imply about what they should be prepared for.

Carry that logic into a courtroom and apply it to a forensic identification system that achieves 99.9% accuracy. This is an extraordinary level of accuracy, and I say this without irony, because achieving it under real-world conditions of variable image quality, partial occlusion, and camera angle distortion requires training and methodological discipline of a kind that most of the world’s forensic practitioners do not possess. And yet: 99.9% accuracy is a 0.1% error rate, which is 1 wrong identification per 1,000 identifications. A large urban forensic unit that processes several thousand identifications per year is, by the mathematics, producing several wrongful identifications annually, not through negligence but through the operation of the very probability that made the system seem so reliable in the first place.

The courts that receive those identifications do not perform this calculation. They receive a percentage and understand it as near-certainty, and they may be correct in any individual case, which is precisely the problem, because the individual case is where decisions are made and the aggregate is where errors reveal themselves, slowly, through the accumulation of cases that eventually attract the attention of organizations like the Innocence Project. That organization documented 375 DNA exonerations in the United States as of 2020 (Innocence Project, 2023, DNA Exonerations in the United States, innocenceproject.org), and across those cases misapplication of forensic science contributed to approximately 45% of the wrongful convictions, while eyewitness misidentification, which is a form of probabilistic evidence with an extremely poor calibration, contributed to approximately 71%. These numbers do not describe the system at its worst. They describe what the system generates routinely, surfacing only when errors eventually become visible through the availability of DNA evidence or the persistence of lawyers willing to work on cases that the file considers closed.

Reading the File Before the Images Is Already a Verdict Before the Trial

The rule I established many years ago is simple: I analyze the footage and comparison material before I read the case file, every time, regardless of how urgent the request is presented or how much advance context I am told I need. I form whatever impressions the visual evidence produces on its own terms, without the weight of a prior conclusion sitting between my eyes and the material, and only after that independent initial analysis do I open the file and read what the investigators believe.

This is not procedural theater. It is the direct practical consequence of a body of research that establishes, without ambiguity, that prior contextual information systematically distorts the conclusions of expert forensic analysts even when those analysts are experienced, motivated to be accurate, and operating in complete good faith. Dror, Charlton, and Péron (2006, “Contextual information renders experts vulnerable to making erroneous identifications,” Forensic Science International, 156[1], 74–78) designed an experiment in which 5 experienced fingerprint examiners were asked to re-examine fingerprint pairs they had themselves previously analyzed and concluded matched. In the re-examination, they were provided with contextual information suggesting that the suspect had either confessed or produced a verified alibi, manipulating the implied direction of the expected conclusion in opposite directions. Across the experiment, 17% of those previously cleared matches were reversed. The examiners did not know they were re-examining their own prior work. They did not know they were being studied. They were doing what expert analysts do, which is examining evidence within a case context, and the case context was sufficient to change 17% of their conclusions.

Kassin, Dror, and Kukucka (2013, “The forensic confirmation bias: Problems, perspectives, and proposed solutions,” Journal of Applied Research in Memory and Cognition, 2[1], 42–52) documented this mechanism across forensic disciplines, noting that the confirmation bias in forensic contexts operates through the same cognitive pathways as in other domains of expert judgment, specifically the tendency of the human perceptual system to resolve ambiguous information in the direction of whichever conclusion the surrounding narrative suggests. Dror (2020, “Cognitive and Human Factors in Expert Decision Making: Six Fallacies and the Eight Sources of Bias,” Analytical Chemistry, 92[12], 7998–8004) identified 8 distinct sources of bias in forensic expert decision-making, among them anchoring effects from early case information, role expectations generated by knowing which party commissioned the analysis, and organizational pressures within forensic laboratories that are structurally oriented toward producing results that support prosecution. The expert who reads the file before examining the evidence has, in the language of experimental psychology, been anchored, and the first piece of information received establishes a reference frame from which subsequent analysis departs less far than the analyst believes.

I analyze images before reading files because I have understood this, and because I have built a procedural structure to compensate for a cognitive vulnerability I share with every other person in this field.

The Case That Taught Me That Being Correct Is Not Enough

I was retained by the defense attorneys in a case involving footage of comparable quality to the video I described in the opening of this article. After a systematic analysis of the available material, comparing the physiognomic features visible in the footage against the comparison images of the defendant using the classification framework I apply in my practice, I concluded that a positive identification was not scientifically supportable. I said so clearly, in writing, with the reasoning documented in sufficient detail to allow another expert to follow every step of the analysis independently.

He was convicted.

The technical description of what followed is that the weight of other evidence already present in the file, organized around a narrative of guilt before I arrived, proved sufficient for the court in combination with an opinion from another expert who reached a different conclusion using methodology I found questionable at best. The more accurate description is that a single dissenting forensic voice, however methodologically rigorous, faces a structural disadvantage in a trial environment where the institutional momentum of the investigation has already organized itself around a narrative and where the adversarial structure of the proceeding creates maximum pressure toward confident conclusions at exactly the moment when intellectual honesty requires expressing uncertainty. The President’s Council of Advisors on Science and Technology noted in its 2016 report (PCAST, 2016, Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods, Executive Office of the President) that courts routinely accept expert testimony in forensic feature-comparison disciplines without requiring the validation studies that would allow claimed error rates to be independently evaluated, meaning there is no structural mechanism by which a methodologically weak forensic opinion can be distinguished from a rigorous one at the point in the proceeding where the distinction most matters.

I think about that case. I think about it because it taught me that accuracy and persuasiveness are different qualities, and that the adversarial courtroom was designed to optimize for the second one.

The Czech Republic: What Persistence Looks Like When Used as a Method

The second case I want to describe took several years, and I describe it precisely because its resolution looks like a success and the success should not obscure what it required.

A young man in the Czech Republic had been convicted on the basis of an investigation conducted, to use charitable language, with more institutional confidence than methodological rigor. The image comparison work in the original expert opinion had not accounted for the known geometric distortions introduced by the specific camera angle and focal length of the equipment used, distortions that, when correctly compensated for, substantially reduced the strength of the match that had been claimed. The feature classification in the original analysis had applied criteria inconsistently across the comparison material. The probability language used in the original expert report employed expressions that no established classification system would authorize for footage of that quality.

Getting any of this acknowledged required years. The inertia of a system that has already produced a conviction and considers the question settled is not a minor obstacle but a structural feature of a legal institution designed to prioritize finality over correction, and it does not yield quickly to evidence that inconveniences it. What eventually made the difference was not a single dramatic reversal but the slow, documented, methodologically precise accumulation of specific objections to specific failures in the original analysis, each one individually insufficient to force reconsideration, collectively impossible to dismiss without addressing.

The lesson I drew was not about persistence, though the years required it. The lesson was that the errors requiring years to correct were not exotic pathological failures of an otherwise functioning system. They were ordinary consequences of a forensic methodology applied without the discipline that the production of probability statements requires.

Sally Clark and the Case That Should Have Changed Everything

I want to address a case that did not involve me, because it is the most fully documented instance I know of in which a named, published cognitive error was committed in a courtroom 12 years after that error had been formally identified in scientific literature, with the result that an innocent person spent more than 3 years in prison.

Sally Clark was a British solicitor convicted in November 1999 of murdering her 2 infant sons, both of whom had died in circumstances the defense attributed to sudden infant death syndrome. The prosecution’s expert witness, the paediatrician Roy Meadow, presented the court with what he described as the probability of 2 SIDS deaths occurring in the same family. He arrived at the figure of 1 in 73 million by squaring his estimate of the per-birth probability of a single SIDS death in a family of similar socioeconomic profile, approximately 1 in 8,543, and treating the 2 events as statistically independent (Nobles & Schiff, 2005, “Misleading statistics within criminal trials: The Sally Clark case,” Significance, 2[1], 6–10). The jury convicted. Sally Clark was imprisoned.

The errors in Meadow’s reasoning were not subtle. The 2 deaths could not be treated as statistically independent, because the genetic and environmental factors contributing to 1 SIDS death in a family substantially elevate the probability of another, a fact that the epidemiological literature on SIDS had established before the trial began. But the more fundamental error, the one the initial Court of Appeal failed to identify, was the one Thompson and Schumann had named in 1987. The probability of 2 SIDS deaths occurring in a family is not the probability that a mother is innocent of murder when 2 of her children have died. These are entirely different quantities, related through Bayes’ theorem in a way that the court never computed, and conflating them requires a logical error in conditional probability reasoning that the Royal Statistical Society, in its formal statement after the conviction, described as a misuse of statistics. The Court of Appeal, when first presented with the statistical objection, dismissed it as “incapable of affecting the safety of the convictions.” Sally Clark was acquitted at her second appeal in January 2003, having served more than 3 years of a prison sentence for murders that had not occurred. She died in March 2007, at 42 years of age, 4 years after her release.

I include this case not because it is unique but because, of the cases I know, it is the one in which the chain of causation from a named cognitive error to a specific human outcome is most fully documented, most precisely attributable, and most impossible to explain as anything other than what it was.

The Fingerprint Paradox and the Mythology of the Gold Standard

Fingerprints occupy a position in the popular forensic imagination that is essentially mythological, understood as the endpoint of uncertainty, the point at which probability cedes to fact. This understanding has been cultivated by more than a century of expert testimony and several decades of entertainment media, and it has established an interpretive template in which a fingerprint match functions as a certainty rather than a probability.

The President’s Council of Advisors on Science and Technology examined the empirical basis for this in 2016 and found it insufficient. The PCAST report concluded that latent fingerprint analysis is a foundationally valid subjective methodology, but with a false positive rate described as “substantial and likely to be higher than expected by many jurors based on longstanding claims about the infallibility of fingerprint analysis” (PCAST, 2016, Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods, Executive Office of the President, p. 101). The National Research Council had raised the same concern 7 years earlier, noting that for most forensic comparison disciplines, the error rate studies required to quantify the actual reliability of expert conclusions had either not been conducted or had not been conducted in ways that allowed independent verification (NRC, 2009, Strengthening Forensic Science in the United States: A Path Forward, National Academies Press, p. 122).

In my practice I have worked on cases combining facial and ear analysis, and in specific conditions the combination produces identification probabilities that approach the reliability attributed to fingerprints in their idealized form. But “approach” is the operative word, and the distance remaining inside it is not a technical inconvenience. It is the space in which real people’s freedom exists. The mythologizing of fingerprints matters for image forensics precisely because it establishes the interpretive frame through which judges and prosecutors receive all probabilistic forensic testimony, treating any high-probability statement as functionally equivalent to certainty and treating the residual probability of error as a philosopher’s abstraction rather than a quantity with real-world consequences that compound across the volume of a jurisdiction’s annual caseload.

What Closed Files Do Not Tell You

The case of the masked perpetrator deserves a specific description, because it illustrates the mechanism I have been describing from the expert’s side of the transaction.

An identification had been offered in this case that described the perpetrator as “highly probable” to be the defendant. The footage showed a person whose face was substantially occluded. The image quality was poor enough that the features visible were limited in number and, more importantly, were features occurring with high frequency in the relevant population, meaning they provided relatively weak discriminating power. The probability language in the existing expert opinion had no methodological basis in any classification system I am aware of, and for the level of imagery quality available and the number of discriminating features visible, a “highly probable” conclusion of identity was not a scientific statement. It was a conclusion dressed in scientific language, and the distinction matters because courts cannot evaluate what they cannot see.

I said so. I described the conclusion as methodologically unsupported, which is the accurate description. The institutional discomfort this produced was approximately what I expected. I am unlikely to receive future commissions from that particular judge, and I have calculated this cost and accepted it, because the function of an expert witness is not to support the preferred conclusion of any party but to present the court with an accurate analysis of the evidence, and when the evidence does not support the conclusion built around it, the expert’s obligation is to say so, with precision and without diplomatic softening.

I am sometimes described as someone who swims against the current. This is accurate. The current flows toward conviction, and an expert who questions the methodological basis of a forensic opinion supporting the prosecution is an inconvenience to a system that has already organized itself around a narrative. The system responds to inconveniences by not inviting them back. I have made my peace with this arrangement.

The Innocence Project’s database contains more than 3,300 exonerations in the United States since 1989, and in approximately 45% of those cases misapplied forensic science was a contributing factor (Innocence Project, 2023). These are the cases in which the error surfaced. There is no registry of the cases in which it did not, because the files closed and stayed closed and the defendants served their sentences, and no one with the resources and the inclination to keep looking ever looked.

Probability Is What Science Offers; Justice Is What the Court Owes

I began with a 43-second video, and I want to end there, because that video represents the irreducible situation of forensic expertise: something happened, an image was recorded, and the recording is incomplete, partial, distorted, and ambiguous in ways that no amount of processing can fully resolve. Between that recording and a verdict sits the expert, whose function is to translate what is visible into a statement the court can use, and whose obligation is to ensure that the statement is accurate rather than merely convenient.

What probability does not offer is certainty. What probability does not permit is the substitution of “very likely” for “certain,” because the gap between those 2 phrases, however mathematically narrow it sometimes becomes, is morally infinite when the thing inside the gap is a person’s freedom and the years that freedom represents. The system has built itself, over generations, around a preference for confident conclusions over honest uncertainty, and this preference is not arbitrary: it is the preference of an institution designed to resolve disputes, and uncertainty does not resolve disputes. But certainty that is not actually certain resolves them incorrectly, and the distance between those outcomes is measured in years spent inside a room the defendant did not deserve to be in.

Sally Clark spent more than 3 years in prison for murders that did not happen, on the basis of a statistical argument that was wrong in a specific, identifiable, previously documented way, in a courtroom where the expert presenting the argument was neither a statistician nor aware that he was committing one of the most well-described errors in forensic probability, and in a jurisdiction whose initial appeals mechanism concluded that this circumstance did not affect the safety of the conviction. Roy Meadow’s name was eventually struck from the medical register. The verdict was eventually overturned. Neither outcome returned those 3 years to Sally Clark, and neither outcome changed the probability that the same error will be committed in a courtroom somewhere tomorrow, by an expert who has not read Thompson and Schumann, in a case whose verdict will not be revisited because there is no DNA evidence available and no organization with the resources to keep looking.

The judge who converts a probability into a certainty because uncertainty is procedurally uncomfortable is not exercising judicial wisdom. He is performing an arithmetic error and dressing it in the authority of a verdict. Until courts develop either the statistical literacy to avoid that error or the procedural architecture to compensate for its absence, the shadow that probability casts over the truth of every forensic case will continue to fall on the people sitting at the table who cannot see it from where they are.

References

Dror, I. E., Charlton, D., & Péron, A. E. (2006). Contextual information renders experts vulnerable to making erroneous identifications. Forensic Science International, 156(1), 74–78. https://doi.org/10.1016/j.forsciint.2005.10.017
Dror, I. E. (2020). Cognitive and human factors in expert decision making: Six fallacies and the eight sources of bias. Analytical Chemistry, 92(12), 7998–8004. https://doi.org/10.1021/acs.analchem.0c00704
Fraport AG. (2024). Annual report 2023. Fraport AG.
Innocence Project. (2023). DNA exonerations in the United States. https://innocenceproject.org/dna-exonerations-in-the-united-states/
Kassin, S. M., Dror, I. E., & Kukucka, J. (2013). The forensic confirmation bias: Problems, perspectives, and proposed solutions. Journal of Applied Research in Memory and Cognition, 2(1), 42–52. https://doi.org/10.1016/j.jarmac.2013.01.001
National Research Council. (2009). Strengthening forensic science in the United States: A path forward. National Academies Press.
Nobles, R., & Schiff, D. (2005). Misleading statistics within criminal trials: The Sally Clark case. Significance, 2(1), 6–10. https://doi.org/10.1111/j.1740-9713.2005.00078.x
President’s Council of Advisors on Science and Technology (PCAST). (2016). Forensic science in criminal courts: Ensuring scientific validity of feature-comparison methods. Executive Office of the President.
Thompson, W. C., & Schumann, E. L. (1987). Interpretation of statistical evidence in criminal trials: The prosecutor’s fallacy and the defense attorney’s fallacy. Law and Human Behavior, 11(3), 167–187. https://doi.org/10.1007/BF01499132