The Illusion of Objectivity: How Bias Creeps into Forensic Science Before the Analysis Even Begins
When the expert is not neutral, the evidence is not either. A forensic practitioner’s account of the mechanisms, the cases, and the structural failures no one in the justice system wants to examine too closely.
I have seen bias more times than I can count, and the worst part is not that it exists. The worst part is that it is invisible to the people who carry it. I have watched it move through courtrooms, through laboratories, through sworn testimony delivered with a certainty that would have embarrassed any scientist who understood the difference between confidence and knowledge. I have watched it dismantle the lives of people who were guilty of nothing more than being in the wrong file at the wrong moment. And I have written reports that named what I saw, factually and without apology, and collected for it the kind of institutional enmity that follows you from one courthouse to the next, not because my analysis was wrong, but because it was uncomfortable.
Let me say clearly what the system rarely admits in open court. A forensic investigator from a state criminal investigation office, in Germany the Landeskriminalamt, in the United States the Federal Bureau of Investigation or any number of its state-level counterparts, who appears as a neutral expert witness for the prosecution is not neutral. She is part of the prosecution’s institutional ecosystem. She was trained by it, credentialed by it, and will return to it the following Monday morning. The idea that this structural embeddedness leaves no trace on her analysis, that she evaluates evidence with the same detached equanimity she would bring to a pure laboratory exercise, is not a hypothesis that the empirical evidence supports. It is a legal fiction that the system sustains because to abandon it would require rethinking the entire architecture of expert witness testimony.
I have watched experts who were genuinely neutral lose their court assignments, not because their analyses were flawed, but because they declined to bend their conclusions toward the outcome that the instructing authority expected. The mechanism is never explicit, nobody ever tells an expert to skew a result, but the feedback loop is precise and consistent. Experts who deliver uncomfortable conclusions stop receiving referrals. Experts who deliver comfortable ones keep appearing, case after case, at the request of the same prosecutors and investigating officers, their names cycling through a quiet institutional white list that nobody acknowledges in writing and everybody understands in practice.
The Experiment That Proved What Everyone Denied
In 2006, cognitive neuroscientist Itiel Dror and his colleague Dave Charlton published what has since become one of the most cited and most inconvenient papers in the history of forensic science. They took fingerprints that experienced latent print examiners had already analyzed and formally identified as matching a suspect, cases that had been closed, the calls made, the files archived, and they re-presented those same prints to the same examiners, this time accompanied by contextual information suggesting the opposite conclusion, namely that the prints were a non-match and that the suspect had a solid alibi. Four of the five examiners changed their verdicts. Not because the prints had changed. Because the information surrounding them had (Dror, Charlton, and Peron, 2006, Contextual information renders experts vulnerable to making erroneous identifications, Forensic Science International, 156, 74-78).
This is a result that the fingerprint community greeted, initially, with what can only be described as organized denial. The resistance was understandable, because what the study demonstrated was that a discipline built on decades of courtroom claims of near-infallibility was, in measurable experimental conditions, vulnerable to exactly the kind of top-down cognitive contamination that any undergraduate psychology student learns about in the first semester. The experts in Dror’s study were not incompetent, they were not corrupt, they were not lazy. They were human, and humans process information in a way that makes prior expectations cascade downward onto perceptual data, shaping what the eye reports to the brain before the brain has finished deciding what it is looking at.
Dror subsequently documented the same effect across DNA mixture interpretation, bite mark analysis, bloodstain pattern analysis, toxicology, firearms examination, and handwriting comparison. In a 2011 study conducted with Hampikian, he described an actual gang rape case in which a DNA mixture had been analyzed by seventeen independent experts, thirteen of whom excluded the defendant when they worked from the raw data alone, while only one included him, and that inclusion had been made under conditions in which the expert knew that a cooperating witness had already testified that this defendant was present (Dror and Hampikian, 2011, Subjectivity and bias in forensic DNA mixture interpretation, Science and Justice, 51, 204-208). The print on the page did not change. The context did. And the context controlled the conclusion.
By 2020, Dror had synthesized three decades of research into a taxonomy of eight sources of bias and six expert fallacies that recur across forensic disciplines, identifying a pyramidal model in which case-specific information, personal history, and institutional environment interact to contaminate what is nominally a purely technical judgment (Dror, 2020, Cognitive and human factors in expert decision making: Six fallacies and the eight sources of bias, Analytical Chemistry, 92, 7998-8004). The most destructive of his six fallacies is the last one, which is the expert’s belief that their experience protects them from bias, that having examined thousands of prints or hundreds of bloodstain patterns has immunized them against the cognitive shortcuts that affect naive observers. The research is unanimous on this point: expertise does not grant immunity. In several documented cases, it amplifies the effect, because the expert’s pattern-recognition system has been trained to find matches, and it finds them, even when they are not unambiguously there.
Brandon Mayfield and the Hundred-Percent Match That Was Not One
The case that broke into public consciousness most visibly was that of Brandon Mayfield, a family lawyer from Portland, Oregon, who had nothing to do with the Madrid train bombings of March 11, 2004, which killed 191 people and injured approximately 1,500 more. A partial fingerprint had been recovered from a bag containing detonators near the scene. The Spanish National Police transmitted a digital image of the print to the FBI through Interpol. The FBI’s Automated Fingerprint Identification System produced a list of twenty candidates whose known prints shared features with the partial. Mayfield appeared on that list.
Several senior FBI latent print examiners then conducted what is formally called an ACE-V analysis, the acronym standing for Analysis, Comparison, Evaluation, and Verification, a protocol nominally designed to produce rigorous independent review at each stage. They concluded, collectively and with formal documentation, that Mayfield’s print was a one hundred percent match. A court-appointed independent expert reached the same conclusion. The FBI arrested Mayfield on May 6, 2004, held him as a material witness, searched his home and office, and continued to maintain the match even after the Spanish National Police sent a letter on April 13 explicitly stating that they had determined the print was a non-match to Mayfield and that they had identified another suspect.
He was released two weeks later when the press broke the story that the Spanish authorities had matched the print to Ouhnane Daoud, an Algerian national, and the FBI was no longer able to sustain its position. The FBI issued an apology, an unusual event in its institutional history. Mayfield subsequently received close to two million dollars in settlement. A 2006 investigation by the Justice Department’s Inspector General identified the causes, and they make for illuminating reading. The examiners had misread distortions in the partial print as features corresponding to Mayfield’s, had failed to adequately account for the poor quality of the digital image, and had, critically, allowed background information about Mayfield, his Muslim faith, his prior representation of a convicted terrorist in an unrelated custody matter, his Egyptian-born wife, and his membership in a mosque, to embed itself in their perceptual processing without ever making this influence explicit (Office of the Inspector General, U.S. Department of Justice, 2006, A Review of the FBI’s Handling of the Brandon Mayfield Case).
The Inspector General’s report contains a sentence that deserves to be framed and hung in every forensic laboratory in the world. One of the examiners had admitted, during the review, that if the person identified had been someone without these characteristics, someone like the Maytag Repairman, the laboratory might have revisited the identification with more skepticism. A man’s life collapsed because the experts’ prior expectations about who was suspicious made a partial, ambiguous print look like certainty.
The Anatomy of How It Happens
What follows is not a list of abstract cognitive categories, it is a description of mechanisms I have observed operating in real cases, across disciplines, and in practitioners who in other contexts might be considered scrupulous professionals. They are not interchangeable, they are not equally common, and they do not all lead to the same kind of error, but they share a common feature: they are invisible to the person they are happening to, and they become dramatically more visible once the outcome of a case is known.
Confirmation bias is the most thoroughly documented and the most consequential. Once an investigator or examiner has formed a working hypothesis, which in most forensic contexts arrives before the analysis begins, the hypothesis functions as a filter that prioritizes evidence consistent with the expected conclusion and demands more justification from evidence that contradicts it. The result is not falsification of evidence, it is a systematic asymmetry in the threshold of persuasion applied to inculpatory and exculpatory findings, and this asymmetry is invisible from the inside because it feels like discernment.
Contextual bias is confirmation bias operating on a specific input channel, namely background information about the case or the suspect that is irrelevant to the technical question but that arrives in the same informational package. In most criminal justice systems, forensic examiners receive the full case file, which contains not just the physical evidence they are asked to analyze but also the investigator’s narrative of the crime, the suspect’s prior record, the charging document, and often the expressed view of the investigating officer about what the analysis should show. The effect is that the analyst’s framing of the evidential question is set before she looks at the evidence, by people who have a stake in the answer.
Adversarial bias operates at the level of the professional relationship rather than the individual case. It develops gradually, often imperceptibly, in experts who work repeatedly for the same side. The expert does not wake up one morning and decide to become an advocate. What happens instead is a slow alignment of interpretive habits, a gradual calibration of the threshold for expressing doubt toward the level that the instructing party has implicitly demonstrated it finds useful.
Selection bias operates on what is included in the final report rather than on the analysis itself. An examiner who has noted five findings, two of which support the hypothesis and three of which complicate or contradict it, faces a structurally uncomfortable choice at the moment of writing. In many jurisdictions no procedural mechanism exists to force disclosure of contrary findings. What appears in court is therefore not a complete accounting of what the evidence contained, it is a curated selection shaped by the adversarial context in which forensic work is performed.
Anchoring bias describes the cognitive weight carried by the first piece of information received. In forensic contexts this is almost always information supplied by the investigator before the analyst has access to the evidence, which means her perceptual apparatus arrives at the evidence already calibrated in a particular direction. Later findings are evaluated relative to this anchor, and departures from it carry an additional cognitive burden that is not symmetrically applied to confirming findings.
Overconfidence bias is the one that produces the most catastrophic courtroom testimony, because the justice system rewards confidence and penalizes doubt in ways that are structurally incompatible with scientific epistemology. Courts have historically rewarded the expert who says I have twenty-five years of experience and in my professional judgment this is a match over the one who says the interpretation is not unambiguous. The 2009 National Academy of Sciences report documented this dynamic explicitly, noting that forensic experts routinely expressed certainty that their methods could not, in scientific terms, support.
Blind spot bias is in some ways the meta-problem. It is the belief, held most strongly by the most experienced practitioners, that awareness of cognitive bias is itself protective against it. Dror’s research is unambiguous on this point: knowing that bias exists does not prevent it from operating. The expert who has attended a seminar on cognitive bias and returned to the laboratory believing she is now immunized is, in Dror’s taxonomy, at the most elevated risk of the practitioners in the room.
Twenty Years of Erroneous Testimony: The Hair Analysis Catastrophe
The most systematic documentation of what institutional bias produces at scale is the FBI’s microscopic hair comparison review, which began in 2013 and whose initial results, released in 2015, revealed that agency examiners had provided scientifically invalid testimony in more than 90 percent of the cases examined, and that this testimony had been used to inculpate defendants in trials conducted over a twenty-year period (FBI, 2015, FBI Testimony on Microscopic Hair Analysis Contained Errors in at Least 90 Percent of Cases in Ongoing Review).
The President’s Council of Advisors on Science and Technology reported in 2016 that across approximately 3,000 cases reviewed, FBI examiners had made scientifically invalid statements at trial in more than 95 percent of the cases where the testimony was used against a defendant (PCAST, 2016, Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods). The examiners were not inventing evidence. They were performing an analysis whose foundations had never been validated by controlled study, expressing conclusions in language that the scientific literature had not authorized, and doing so for two decades under the implicit institutional expectation that this was what the courts required and what prosecutors found useful.
The Innocence Project has documented 74 people who were convicted using microscopic hair comparisons and later exonerated by DNA testing. One of those is George Perrot, who served nearly thirty years in prison for a rape he did not commit. Another is Steven Chaney, who was convicted of murder in Texas partly on the basis of bite mark comparison evidence and spent 28 years incarcerated before the Texas courts abandoned the evidence type and exonerated him.
The Structural Problem No Reform Has Touched
What the individual case studies and the experimental literature share is a diagnosis that points beyond the behavior of individual experts toward the architecture of the system in which they operate. Forensic science laboratories in the majority of jurisdictions are funded by, administratively housed within, and operationally supervised by law enforcement agencies. The analyst who examines the evidence in a criminal case is an employee of the same institutional complex that assembled the case, arrested the suspect, and has an interest in securing the conviction that the arrest anticipates. This structural embeddedness creates pressures that operate at the level of culture and expectation rather than at the level of explicit instruction, and it is precisely for this reason that practitioners subjected to them typically cannot feel them.
The structural remedy is not difficult to specify, and it is not new. The independence of forensic laboratories from law enforcement administration has been advocated by the National Academy of Sciences, the PCAST report, numerous professional associations, and every serious academic analysis of the problem published in the past two decades. The reform has not been implemented in any jurisdiction at scale, and the reason it has not been implemented has nothing to do with its technical or financial feasibility. It has not been implemented because the institutions that would have to relinquish control over forensic analysis are the same institutions that benefit from the current arrangement.
What I Know From My Own Cases
There was a case I will not identify in detail but that encapsulates everything I have described. A human biology professor was called as an expert witness and testified, in open court, that a suspect who appeared entirely masked in video footage could be identified with a probability of 99.72 percent. A masked figure, no visible facial features, no biometric anchor of any conventional kind. The basis for the identification was that similar clothing had been found to have been purchased through the same online platform. The professor knew this contextual fact before he analyzed the footage. The contextual fact shaped what he saw in the footage, and the conclusion arrived with a statistical precision that bore no relationship to what the methodology could actually support.
I wrote a method-critical report on that case. It was factual, thorough, and cited the relevant literature on the limits of visual identification from masked subjects. It was also deeply unwelcome. A report that calls a colleague’s identification probability scientifically indefensible is not received as a contribution to scientific accuracy, it is received as an attack on a professional alliance, and the institutional consequences flow from that interpretation rather than from the technical merits of the analysis. I am aware of which courts will not call me and why. I carry the knowledge of it without embarrassment, because the alternative, which is to write reports calibrated to the expectations of the instructing authority, is not a compromise I have any interest in reaching.
The question that deserves more direct attention than it typically receives is not whether individual experts are biased, because the evidence that they are is now abundant and unrebutted. The question is whether the justice system wants the problem solved. Judges who call the same experts year after year know, at some level, that they are calling them because those experts deliver reliable results, and reliable in this context means reliably aligned with the prosecution’s theory of the case.
What Actually Helps
The forensic science literature on bias mitigation is more developed than the implementation of its recommendations would suggest. Several interventions have demonstrated measurable effectiveness in controlled conditions, and the gap between what is known and what is practiced is itself a data point about the system’s appetite for reform.
Linear sequential unmasking is the most systematically validated approach. It requires the analyst to commit to each analytical step before receiving the next piece of contextual information, structuring the evidentiary evaluation in a direction that runs from the trace evidence outward toward the context rather than from the investigative hypothesis inward toward the evidence. The method was developed from Dror’s work and has been formally recommended by the National Institute of Standards and Technology’s Organization of Scientific Area Committees. It has been adopted in a small number of laboratories globally. It remains the exception rather than the standard.
Blind verification, in which the analyst who performs the initial comparison does not know the conclusion reached by the analyst who performed it first, has been demonstrated to disrupt the cascade effect through which the first examiner’s conclusion shapes all subsequent evaluations. It requires additional staffing and coordination, and it removes from the verification step the institutional pressure to confirm rather than to challenge.
Removing case context from the initial analysis, which means providing the analyst with the physical evidence and the technical question without the investigative narrative, the suspect’s history, or any other information that is not required for the technical evaluation, is conceptually simple and technically undemanding. The institutional barrier to it is the same as the barrier to all the other reforms: the people who would have to change the way case files are assembled and transmitted are the same people who benefit from the current arrangement.
What I have done in my own practice is to write the code I use myself. I do not trust commercial tools I cannot inspect. I need to see the data, hear the signal, understand the output at the level of its technical generation, because the alternative is to trust an algorithm built by someone whose assumptions I cannot audit and whose biases I cannot see.
What the Justice System Owes the People It Processes
No justice system is neutral. All of them are operated by people who bring to their work the full apparatus of human cognition, which includes biases that neither training nor experience eliminates. This is not a council of despair, it is a description of the actual situation that any honest reform has to start from. The question is not how to achieve a perfect objectivity that the cognitive science literature has established is not achievable. The question is how to build procedural structures that constrain the influence of bias rather than amplifying it.
The constraints exist. They have been documented, tested, refined, and published in peer-reviewed journals over the course of three decades of work by Dror and his colleagues and predecessors. The National Academy of Sciences named them in 2009. The President’s Council of Advisors on Science and Technology restated them in 2016. They are not secret, not expensive, not technically inaccessible. They sit in the published literature, waiting for a system willing to be accountable to the evidence it claims to depend on.
Courts keep calling the same experts. The white lists are real, even where they are unwritten. The black lists are real, even where the word is never spoken. The expert who reports inconvenient findings collects institutional consequences that are delivered without explanation and felt without recourse. This is not a problem at the edges of the system, it is a feature of its center, and the people who pay for it are not the experts or the prosecutors or the judges. They are the ones in the dock who assumed that the person testifying about them had examined the evidence as a scientist and not as an ally.
Objectivity is not a professional quality. It is not something that comes with a credential, a title, years of service, or a laboratory coat. It is a practice, one that has to be actively maintained against the constant pressure of context, expectation, institutional loyalty, and the very human desire to be seen as certain rather than honest. When it stops being practiced, the damage does not always produce an exoneration that makes the news. Most of the time it simply produces a conviction, and the person serving it has no way of knowing that the confidence displayed in the testimony against them was a performance the evidence could not support.
The justice system can be better than this. The science that would make it better exists. The decision not to use that science is made, every day, by the same institutions that justify their authority by claiming the science is on their side.
References
- Dror, I. E., Charlton, D., and Peron, A. (2006). Contextual information renders experts vulnerable to making erroneous identifications. Forensic Science International, 156(1), 74-78.
- Dror, I. E., and Charlton, D. (2006). Why experts make errors. Journal of Forensic Identification, 56(4), 600-616.
- Dror, I. E., and Hampikian, G. (2011). Subjectivity and bias in forensic DNA mixture interpretation. Science and Justice, 51(4), 204-208.
- Dror, I. E. (2020). Cognitive and human factors in expert decision making: Six fallacies and the eight sources of bias. Analytical Chemistry, 92(12), 7998-8004.
- FBI (2015). FBI testimony on microscopic hair analysis contained errors in at least 90 percent of cases in ongoing review. Press release, April 20, 2015.
- Innocence Project (2023). Not a strand of evidence: Cases involving microscopic hair comparison analysis.
- Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.
- National Research Council, National Academy of Sciences (2009). Strengthening Forensic Science in the United States: A Path Forward. National Academies Press.
- Office of the Inspector General, U.S. Department of Justice (2006). A Review of the FBI’s Handling of the Brandon Mayfield Case. Oversight and Review Division.
- President’s Council of Advisors on Science and Technology (2016). Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods. Executive Office of the President.
- Risinger, D. M., Saks, M. J., Thompson, W. C., and Rosenthal, R. (2002). The Daubert/Kumho implications of observer effects in forensic science: Hidden problems of expectation and suggestion. University of Chicago Law Review, 69(1), 1-56.