Article

What 99.72 Percent Really Mean: A Forensic Practitioner on the Discipline of Identifying the Living from Images

Mar 12, 2026 | 30 min | anthropology
Language
EN DE
Forensic practitioner identifying a living person from surveillance images

On the Schwarzfischer classification of identity probabilities that has shaped German forensic anthropology for more than 3 decades, why the 0.28 percent residual uncertainty in the highest probability predicate would translate into 472 emergency calls per day at Frankfurt Airport, what the ACE-V methodology and the AGIB standards actually require of a practitioner, and the precise reason an experienced examiner refuses to issue a numeric percentage when asked by a court for one

The speed camera photograph arrived in the usual way: scanned, compressed, sent through a court administration system that had apparently been designed to extract maximum ambiguity from minimum resolution. The face in the image occupied perhaps 40 by 40 pixels on the original capture, the result of a camera positioned to read license plates rather than document faces, which meant the person driving had been reduced to an arrangement of light values that told me almost nothing and told the investigating officer almost everything, which is a specific epistemological hazard I have spent more than 2 decades trying to correct.

I did not tell the officer what he wanted to hear. I told him what the image could support, which was considerably less than his working hypothesis, and I documented exactly why. This is a description of roughly 60 percent of the forensic facial comparison examinations I have conducted over the course of my career: the evidence was insufficient for a conclusion, the requesting party was disappointed, and the report said so in writing. It is also a description of the examination most frequently avoided when practitioners feel institutional pressure to produce useful results rather than accurate ones.

The discipline this article describes is the morphological identification of living persons from photographic and video evidence, a forensic-anthropological field whose German tradition stretches back to Knußmann and Schwarzfischer and whose international expression has been codified by the Facial Identification Scientific Working Group and the European Network of Forensic Science Institutes. The 2 traditions overlap substantially in their methodology and converge in their conclusions, but the German tradition has a 3-decade head start in the matter of probability classification, and the international standards have produced a 2-decade head start in cognitive-bias control. A competent practitioner works inside both frameworks at once.

What Forensic Facial Comparison Is and What It Is Not

There is a persistent confusion in law enforcement practice, in prosecutorial strategy, and in media reporting between 2 substantially different activities that share a vocabulary but not a methodology.

The first is automated facial recognition, in which an algorithm compares a probe image against a database of reference images and returns a ranked list of candidate matches with associated confidence scores. The second is forensic facial comparison, in which a trained human examiner systematically analyzes specific morphological features of 2 or more images and renders a conclusion about whether they could or could not represent the same individual.

Automated facial recognition is an investigative tool. It generates leads. It is not, in any jurisdiction with adequately developed forensic standards, the basis for an identification conclusion, because its output is a ranked list of candidates that may or may not include the actual perpetrator, and because its documented error rates for low-quality images, non-frontal poses, and individuals from demographic groups underrepresented in training data range from statistically negligible to operationally catastrophic depending on the specific system and conditions (Bergold, A.N., & Kovera, M.B., 2025, “The contribution of facial recognition technology to wrongful arrests and trauma”, Psychological Trauma, 17(Suppl 1), S225-S233). Robert Williams in Michigan, Porcha Woodruff in Detroit, Randal Quran Reid in Georgia: these are documented cases of wrongful arrests in the United States in which automated facial recognition systems generated incorrect matches that investigating officers then confirmed through confirmation bias and insufficient scrutiny of contradicting evidence.

Forensic facial comparison is a different procedure entirely. Its current internationally accepted methodology is morphological analysis, the systematic feature-by-feature comparison of anatomical structures visible in 2 or more images, conducted according to the ACE-V workflow: Analysis, Comparison, Evaluation, and Verification (FISWG, 2021; ENFSI, 2018, Best Practice Manual for Facial Image Comparison, Version 01). The German variant of this methodology, developed in parallel and now well integrated with the international standards, operates under the framework codified by the Arbeitsgruppe für anthropologische Identifikation nach Bildern, abbreviated AGIB, and follows the probability classification developed by Schwarzfischer in 1992.

The Schwarzfischer Classification and What 99,72 Percent Actually Means

The probability classification that German courts have used in forensic anthropological identification reports for the past 3 decades originates in the chapter by Schwarzfischer published in the Kriminalistik handbook of 1992 (Schwarzfischer, F., 1992, “Identifizierung durch Vergleich von Körpermerkmalen, insbesondere anhand von Lichtbildern”, in: Kube, E., Störzer, O., & Timm, J., Eds., Kriminalistik. Handbuch für Praxis und Wissenschaft, Bd. I, pp. 735-761, Boorberg Verlag). The classification distinguishes 9 verbal predicate categories, arranged symmetrically around an indeterminate middle.

The categories progress from “Identität praktisch erwiesen” (identity practically proven) through “Identität höchst wahrscheinlich” (identity highly probable), “Identität sehr wahrscheinlich” (identity very probable), “Identität wahrscheinlich” (identity probable), “Identität nicht entscheidbar” (identity not determinable), and continue symmetrically into the non-identity range with “Nichtidentität wahrscheinlich”, “Nichtidentität sehr wahrscheinlich”, “Nichtidentität höchst wahrscheinlich”, and “Nichtidentität praktisch erwiesen”. The intermediate predicates exist to allow honest expression of uncertainty, and the classification explicitly rejects the false dichotomy between identification and exclusion that less mature comparative disciplines have sometimes imposed on themselves.

The numerical interpretation of these verbal categories has been the subject of considerable methodological debate. Schwarzfischer assigned approximate probability ranges to the upper categories that have circulated in the German forensic-anthropological community since the early 1990s: “Identität höchst wahrscheinlich” corresponds approximately to 99.00 to 99.72 percent, with the higher categories converging asymptotically toward but never reaching 100 percent certainty. These percentages, however, are reference values for the verbal scale, not numerical results that should be reported to the court in place of the verbal predicate. The reason for this matters and was articulated by Knußmann in 1991 in his commentary on probability statements in morphological identity reports (Knußmann, R., 1991, “Zur Wahrscheinlichkeitsaussage im morphologischen Identitätsgutachten”, Neue Zeitschrift für Strafrecht, 11(4), 175-177): a numerical percentage suggests mathematical precision that the underlying method does not possess, because the assumption of independent and equally weighted features that would be required for genuine probabilistic quantification is not satisfied in morphological analysis.

The features assessed in a forensic facial comparison are not statistically independent. The shape of the nasal bridge correlates with the shape of the nasal tip; the orbital morphology correlates with the supraorbital ridge; the mandibular contour correlates with the gonial angle. The frequency of any given feature combination in the general population is not the product of the individual feature frequencies but a smaller number, sometimes substantially smaller, because the correlations between features compress the actual range of variation in human faces below what independent feature frequencies would predict. A practitioner who counts 127 features and treats them as independent inputs to a probability calculation produces a result that is mathematically incorrect even before the additional problems of variable image quality, ambiguous feature visibility, and population-frequency uncertainty are considered.

This is why the German forensic-anthropological tradition uses the verbal scale rather than a numerical percentage, and why a practitioner asked by a court for a numerical equivalent of the highest probability predicate should, in my judgment, decline to provide one while explaining clearly why. The Bundesgerichtshof has explicitly recognized this characteristic of anthropological identification methodology in its 2005 decision in case 1 StR 91/04 (15 February 2005, LG Memmingen): the court stated that, unlike opinions on blood alcohol analysis or blood group determination, anthropological identity opinions are not a standardized procedure, that morphological features are not unambiguously determinable, and that continuous transitions exist between feature categories. The court understood, in 2005, what some prosecuting and defense lawyers still ask experts to overlook: forensic facial comparison is not a standardized procedure in the sense that blood alcohol analysis is standardized, and treating it as one produces false certainty rather than improved evidence.

The Frankfurt Airport Calculation: Why 0,28 Percent Is Not a Small Number

I have used the following thought experiment in court testimony for many years, and the moment of recognition on the faces of judges and lawyers when the arithmetic completes is consistent enough that I have come to rely on it as a teaching device. The example takes the upper bound of the “Identität höchst wahrscheinlich” predicate, 99.72 percent, and asks what the remaining 0.28 percent means in a context that the listener can intuit.

Frankfurt Airport, Germany’s largest aviation hub, processed approximately 61.6 million passengers in 2024 (Fraport AG, 2025, Verkehrszahlen 2024). Distributed evenly across 365 days, this works out to approximately 168,767 passengers per day passing through the terminal complex. If 99.72 percent of those passengers passed through the day without incident, the remaining 0.28 percent, the rate of imperfection embedded in the highest probability predicate of the Schwarzfischer scale, would correspond to approximately 472 passengers per day requiring emergency medical attention. That is roughly 1 ambulance call every 3 minutes throughout the operating day, every day, year-round.

The court that hears this calculation invariably pauses, because the listener has just translated a number that sounded like certainty into a concrete operational reality that obviously does not match what they understand “highest probability” to mean in everyday language. A facility experiencing 472 medical emergencies per day is not safe. A diagnostic test producing 472 false positives per day in a flow of 168,000 patients is not reliable. And a forensic identification methodology operating at 99.72 percent on a sufficiently large case volume is not certain in the way that conviction in criminal court requires.

The point of the exercise is not to delegitimize the highest probability predicate. The point is to make explicit what the predicate actually contains: a residual probability of error that is small in mathematical absolute terms and substantial in operational terms, and that the practitioner has an obligation to communicate honestly rather than allow to be papered over by language that the listener will interpret as practical certainty. This is the difference between a report that serves justice and a report that serves the prosecution or defense. The expert who delivers “Identität höchst wahrscheinlich” without contextualizing what that predicate does and does not exclude has not lied; the expert has simply chosen to allow the listener’s misinterpretation of confidence intervals to do the work that the actual finding does not support.

The German Methodological Foundation: Knußmann, the AGIB Standards, and the Working Process

The methodology that underlies the Schwarzfischer probability classification was developed primarily in the German-speaking academic tradition by Rainer Knußmann at the University of Hamburg and his collaborators, codified in the Anthropologie textbook of 1988 (Knußmann, R., 1988, “Die morphologische Identitätsprüfung”, in: Knußmann, R., Ed., Anthropologie. Handbuch der vergleichenden Biologie des Menschen, Band I/1, pp. 389-407, Gustav Fischer Verlag). The Knußmann approach decomposes the face into a structured inventory of morphological features and assesses each feature independently before integrating the assessments into a conclusion, an approach that anticipated the international ACE-V workflow by several years and that operates on the same epistemological logic.

The German standards were further developed and operationalized by the AGIB, an interdisciplinary working group whose current standards document, “Grundlagen, Kriterien und Verfahrensregeln für Gutachten”, in the version of 16 December 2011, is available at bildidentifikation.de and remains the operative reference framework for forensic-anthropological identification in German jurisdictions. The AGIB standards distinguish, importantly, between Wiedererkennen (recognition) and Identifikation (identification), 2 cognitive processes that look superficially similar but operate on fundamentally different bases.

Wiedererkennen is the intuitive recognition of a familiar face, a process that occurs rapidly and largely outside conscious awareness, drawing on the gestalt impression of facial structure that the brain stores for familiar individuals. It is the process by which an eyewitness identifies a suspect from a lineup, and it is subject to all of the well-documented failure modes of eyewitness identification, including confidence-accuracy dissociation, suggestibility, and the cross-race effect.

Identifikation, in the technical AGIB sense, is the systematic feature-by-feature comparison of 2 or more images according to a documented inventory of morphological characteristics, with explicit assessment of each feature’s visibility, consistency, and population frequency. The process is slower, more documented, and more reproducible than Wiedererkennen, and its conclusions are anchored in the verbal Schwarzfischer scale with the explicit acknowledgment that morphological features are not eindeutig bestimmbar (unambiguously determinable) and that gleitende Übergänge (continuous transitions) exist between feature categories.

The working process in a typical case follows a structured sequence. The probe image and the reference image are first assessed independently for quality, with particular attention to resolution, lighting, pose, focal length distortion, and any post-capture processing that may have introduced artifacts. The probe image is then aligned with the reference image using the parallele Linien method developed by Reche in 1965 (Reche, O., 1965, “Eine neue Methode zur Erleichterung der Beweisführung in Identifizierungsprozessen”, Homo, 16, 113-116), in which horizontal guide lines are placed at corresponding facial landmarks across both images to standardize the relative orientation and to expose pose-related distortions that would otherwise contaminate the comparison.

The feature inventory is then worked through systematically, beginning at the cranial vertex and proceeding inferiorly through the forehead, the supraorbital region, the orbits, the nasal structures, the perioral region, the chin, and where visible the auricular morphology and the cervical region. Each feature is described in its observed form in each image, and only then are the 2 descriptions compared. This sequence, observe in one image before comparing to the other, is not arbitrary procedural pedantry; it is a structural defense against confirmation bias, the cognitive tendency to perceive features in the second image through the framework of expectations established by the first.

Why Confirmation Bias Poisons Identifications and How the German Tradition Addresses It

The forensic science literature contains ample documentation of the damage that confirmation bias causes in pattern comparison disciplines, and facial comparison is not exempt. Stewart and Kukucka (2025) demonstrated in a controlled study of simulated facial recognition tasks that both contextual information about suspects and automated confidence scores significantly biased participants’ face-matching decisions, with participants adjusting their similarity judgments toward the biased information even when that information was irrelevant to the actual visual comparison (Stewart, C.K., & Kukucka, J., 2025, “Cognitive bias affects perception and decision-making in simulated facial recognition searches”, Behavioral Sciences, 15(8), 1094).

The German tradition addresses confirmation bias through 3 specific structural mechanisms that have been part of competent forensic-anthropological practice since long before the cognitive psychology literature documented the underlying phenomena. The first is the requirement that the examiner work blind to case context, examining the image material and producing the morphological description before reviewing the case file, the police hypothesis about identity, or any prior expert opinion. The second is the requirement that each feature be characterized independently in each image before any comparison is made. The third is the requirement that ambiguous features be documented as ambiguous rather than resolved in the direction of the investigative hypothesis.

My practice on commissioned identification examinations begins with the receipt of the images alone, without the case summary, without the suspect’s biographical information, and without the investigating authority’s working hypothesis about identity. The morphological description is completed and dated before the case file is opened. This is not procedural fastidiousness; it is the minimum requirement for producing a conclusion that is not contaminated by the information environment of the investigation. When a court later asks me whether another expert reaching a different conclusion on the same images is doing something wrong, my answer depends entirely on whether they followed the same blind analysis protocol. If they reviewed the case summary and the suspect’s prior criminal history before examining the images, their conclusion is an opinion about a hypothesis, not the result of an independent analysis.

The in dubio pro reo principle is not merely a legal convention that the forensic expert must honor at the level of the final conclusion. It permeates the entire examination at every feature assessment, at every quality determination, at every judgment about whether a difference between 2 images represents genuine morphological disagreement or is the product of illumination, angle, expression, or temporal change. When the evidence is ambiguous, the ambiguity must be reported as ambiguity, not resolved in the direction of the investigation’s working hypothesis. The accused bears no burden of proving dissimilarity, and the forensic examiner who finds 15 consistent features and 3 ambiguous ones and reports an identification without documenting the ambiguous features has failed at the most basic level of honest expert practice.

The Fusiform Gyrus and Why Human Brains Process Faces Differently

Face perception is not a general-purpose visual cognition task. The human brain has dedicated neural infrastructure for face processing, centered on the fusiform face area, a region of the inferotemporal cortex located in the fusiform gyrus, which produces differential responses to faces compared to other complex visual stimuli of equivalent perceptual difficulty (Kanwisher, N., McDermott, J., & Chun, M.M., 1997, “The fusiform face area: A module in human extrastriate cortex specialized for face perception”, Journal of Neuroscience, 17(11), 4302-4311). The inferotemporal cortex, of which the fusiform gyrus is a part, also includes the superior temporal sulcus and the anterior temporal lobe, together forming a distributed network that processes identity, expression, and gaze direction through partially overlapping but distinguishable circuitry. This network is active within tens of milliseconds of stimulus onset and operates substantially below conscious awareness for most of its processing, which is why experienced face recognition feels like intuition even when it is the product of an elaborate computational hierarchy.

What develops with expertise is not a new neural structure but a refinement of this existing infrastructure. The fusiform face area increases in selectivity and in the precision of its responses with perceptual expertise, a finding documented not only for faces but for other domains of expert visual pattern recognition, including radiology, chess, and bird identification (Gauthier, I., Tarr, M.J., Anderson, A.W., Skudlarski, P., & Gore, J.C., 1999, “Activation of the middle fusiform ‘face area’ increases with expertise in recognizing novel objects”, Nature Neuroscience, 2, 568-573).

At the extreme end of this distribution sit super-recognizers, individuals who without formal training outperform trained forensic examiners on standardized face-matching tasks and maintain exceptional accuracy under conditions of image degradation, pose variation, and temporal gap between learning and testing. The term was coined by Russell and colleagues in 2009 to describe 4 self-identified individuals, and subsequent systematic testing has established that the ability distributes normally across the population with super-recognizers representing approximately the top 1 to 2 percent of the distribution (Russell, R., Duchaine, B., & Nakayama, K., 2009, “Super-recognizers: People with extraordinary face recognition ability”, Psychonomic Bulletin and Review, 16(2), 252-257). A 2024 EEG study decoded super-recognizers’ face recognition ability from brain activity with up to 80 percent accuracy within 1 second of stimulus onset, identifying amplified processing signatures in their event-related potentials compared to typical recognizers, the neurophysiological substrate of what practitioners in my field sometimes call reading a face.

The practical implication for forensic facial comparison is that the examiner is not operating a measurement instrument; the examiner is the instrument, and the instrument requires calibration. That calibration is not a matter of theoretical knowledge. It is the accumulation of thousands of hours of face processing under conditions of deliberate attention to the features that distinguish and identify, which restructures the neural substrate in ways that improve both accuracy and the efficiency with which uncertainty is recognized. The examiner who has processed 100,000 faces over more than 2 decades does not simply have more facts in working memory; their fusiform face area has been refined by use, more precise weighting of relevant signals, faster recognition of diagnostic feature combinations, earlier and cleaner flagging of cases where the evidence does not support a conclusion.

What the Supermarket Actually Teaches

Every professional domain produces its own form of deliberate practice, the structured engagement with relevant stimuli under conditions that provide performance feedback and push the practitioner beyond their current level of competence. In music, this is scales at tempo, then pieces at tempo, then performance under pressure. In surgery, it is simulation laboratory then supervised procedure then independent procedure with escalating complexity. In forensic facial comparison, the relevant form of deliberate practice involves sustained, focused engagement with the faces of individuals encountered in naturalistic settings, paying systematic attention to the feature categories that the method requires one to assess under examination conditions.

I have practiced this for the duration of my career in the form of attentive observation in environments where large numbers of unfamiliar faces are present: supermarkets, train stations, shopping centers, waiting rooms. The training I describe is not casual people-watching. It is the disciplined application of feature-by-feature attention to individual structural elements of the face, the habit of registering the specific morphology of an orbital region or a nasal structure or an auricular helix rather than the gestalt impression of a person’s appearance, and then, minutes or hours later, attempting to match individuals seen from different angles or in different light. The fact that this practice takes place in a supermarket rather than a laboratory does not make it informal; it makes it ecologically valid in the sense that the viewing conditions, partial occlusion, variable lighting, non-frontal angles, movement, are precisely the conditions under which forensic facial comparison must function.

The neural mechanism this practice engages is well characterized. Super-recognizer research using eye-tracking has established that individuals with superior face recognition ability sample facial regions in a pattern that differs from typical observers: more systematic exploration during encoding, heavier weighting of the upper face including the eye region and the nasal bridge, and more efficient extraction of diagnostic spatial frequency information from degraded stimuli (Dunn, J.D., Varela, V., Popovic, B., Summersby, S., Miellet, S., & White, D., 2025, “Super-recognizers sample visual information of superior computational value for facial recognition”, Proceedings of the Royal Society B, 292). The expert’s gaze is not randomly different from the novice’s gaze; it is specifically trained toward the features that carry the most identity-diagnostic information.

My years of deliberate observation have produced an internal reference library of faces that is not consciously organized or consciously retrieved, but that constitutes the neural substrate against which novel face stimuli are compared during examination. When I look at a speed camera photograph and immediately perceive that the orbital region is inconsistent with the comparison image regardless of image quality, that perception is not intuition; it is the product of a training history that has calibrated my fusiform face area to detect precisely this kind of structural discrepancy.

What Image Quality Does and Does Not Change

The technical improvement in surveillance camera resolution over the past 20 years is genuine and operationally significant. Analog CCTV systems produced images in which a face at 5 meters covered approximately 5 pixels of vertical height, which is insufficient for any reliable morphological analysis. Contemporary 4K surveillance cameras can produce facial detail at comparable distances that supports analysis of fine structural features including mole location and scar morphology, features that were analytically invisible in the previous generation of equipment.

This improvement is also methodologically relevant for the German tradition specifically, because the Schwarzfischer percentage ranges from 1992 were calibrated against the image quality typical of that era. Modern high-resolution material allows substantially more morphological features to be assessed than the original Schwarzfischer framework anticipated, which has 2 implications. The first is that conclusions reachable at “Identität höchst wahrscheinlich” on a 1990s analog CCTV image may, on a contemporary 4K image of the same individual, be reachable at a higher predicate, because more features are visible and more individuating characteristics including moles, scars, and skin micro-features become accessible. The second is that the original 99.66 to 99.72 percent reference range for the highest non-absolute predicate may no longer be the appropriate numerical equivalent for high-resolution work, because the information content of the modern image substantially exceeds what Schwarzfischer was characterizing.

This is the technical reason that the Schwarzfischer percentages should be treated as historical reference values for the verbal scale and not as the operative numerical conversion in a contemporary report. The verbal scale carries forward; the percentages were a snapshot of the achievable certainty on the equipment of the early 1990s. The methodology has evolved; the verbal predicates have proven durable; the numerical equivalents have not.

What image quality improvement does not change is the epistemological structure of the examination. A higher-resolution image produces more morphological features that can be assessed, which increases the information available but does not eliminate the requirement to assess each feature honestly and to document whether each assessed feature is consistent, inconsistent, or not assessable. A face with 40 assessable features and 38 consistencies and 2 inconsistencies is not more certainly identified than a face with 12 assessable features and 12 consistencies; it is more informatively assessed, which allows a more confident conclusion when the evidence supports one, but does not change the formal requirement to account for the inconsistencies.

The research literature also documents that suboptimal conditions, poor resolution, unfavorable angle of incidence, non-frontal pose, and partial occlusion, substantially affect the accuracy of forensic facial comparison even when conducted by trained examiners using standardized morphological analysis (PMC8698381, 2021, “Forensic facial comparison: Current status, limitations, and future directions”, Biology, 10(12), 1269). The ENFSI Best Practice Manual specifies an image quality triage process that should precede any forensic facial comparison: a structured assessment of the probe image for resolution, pose, expression, lighting, and occlusion, producing a determination of whether the image meets minimum quality criteria for any meaningful morphological analysis. I apply this triage to every examination I accept. Approximately 20 percent of the commissioned examinations I receive do not pass the triage and are returned with a documented explanation of why the image quality precludes any reliable analysis.

Temporal Change, Disguise, and the Limits of Comparison Across Time

Human faces change over time in ways that systematic training can partially compensate for and honest examination must explicitly acknowledge. The bony superstructures of the face, the orbital rims, the zygomatic arches, the mandibular contour, the nasal skeleton, are substantially stable across adulthood, which is why morphological analysis remains possible across temporal gaps of years or decades when high-quality images are available for both the comparison period and the probe period. The soft tissues of the face, including the skin surface morphology, fat distribution, and muscle mass, change continuously with age, weight, medical history, and environmental exposure, and their appearance in any given image reflects not only inherent structure but the temporal state of those structures at the moment of capture.

Examining images separated by 10 or 15 years requires explicit adjustment of the analytical framework to account for predictable aging changes in each feature category, particularly in the periorbital region where skin laxity produces progressive changes in upper eyelid and brow position, in the nasolabial fold region where soft tissue descent alters the apparent shape of the mid-face, and in the cervical and mandibular region where changes in fat distribution alter the apparent angle and contour of the jaw. These changes are not random; they follow patterns that are predictable at the population level and that an experienced examiner learns to model when comparing images across time.

Disguise presents a different category of problem. Deliberate modification of appearance through hair color change, weight alteration, facial hair, eyewear, or prosthetics can disrupt morphological analysis in specific feature categories while leaving others intact. A person who has grown a full beard has occluded the lower facial structures that contribute substantially to morphological analysis, but has not altered the orbital region, the nasal structure, or the auricular morphology, which remain available for assessment if the images support it. The examiner’s task is to identify which features remain assessable despite the alteration, assess them, and explicitly document what cannot be assessed and why.

What the Bundesgerichtshof Has Said and What It Means in Practice

The Bundesgerichtshof addressed the methodological status of anthropological identification in its 2005 decision in case 1 StR 91/04 (15 February 2005, LG Memmingen). The decision recognized that anthropological identity opinions are not standardized procedures in the same sense that blood alcohol analysis or blood group determination are standardized, that morphological features are not unambiguously determinable, and that continuous transitions exist between feature categories. The court did not invalidate the methodology; it specified the conditions under which an anthropological opinion is admissible as evidence in criminal proceedings. The expert must document the methodology used, the features assessed, the population frequencies considered, and the basis for the final probability predicate. The judgment of the court must include an independent evaluation of the strength of the evidence presented, not merely a deferential acceptance of the expert’s conclusion.

This decision has shaped 2 decades of forensic-anthropological practice in Germany. The reports I produce describe each assessed feature, its appearance in each image, its consistency or inconsistency, the population frequency relevant to its evidentiary weight, and the chain of reasoning that leads from the feature inventory to the final predicate. A defense attorney who wishes to challenge an identification has, in such a report, the documentary basis for that challenge: each step is named and can be examined. An attorney who attempts to challenge an identification report that lacks this documentation has, in my experience, an easier task than one who attempts to challenge a fully documented analysis, because the absence of documentation is itself a methodological defect that the BGH framework explicitly disallows.

When a Court Asks for a Number: The Practitioner’s Response

I was asked, in February 2025 by counsel for the defense in a case at the Landgericht Nürnberg-Fürth, to provide a numerical percentage equivalent of “Identität höchst wahrscheinlich” in an identification opinion I had submitted to the court. My response, transmitted in writing on 27 February 2025, declined to provide the requested numerical figure and explained why. The core of that explanation is worth reproducing here in substance, because it captures the principled position that the German forensic-anthropological tradition has held for more than 3 decades.

A numerical percentage would suggest mathematical precision that the underlying methodology does not possess. The features assessed are not statistically independent in the way required for genuine probabilistic quantification. The Schwarzfischer percentage ranges were calibrated against image quality from the early 1990s and cannot be transferred unchanged to modern high-resolution material. The verbal predicate “Identität höchst wahrscheinlich” rests on the cumulative weight of 127 assessed features in the specific case rather than on a numerical calculation. The question of a concrete percentage misconstrues fundamental principles of probabilistic evaluation within this discipline.

The court did not find this response evasive. It found it responsive, because it explained both why a numerical equivalent was inappropriate in this case and what the verbal predicate did and did not commit the witness to. The practitioner who responds to such a question by inventing a number does the discipline no favor and, more importantly, does the accused no favor: the invented number creates a false impression of precision that the actual evidence does not support.

Polemische Vorwarnung

I want to name clearly what forensic facial comparison is not, because the confusion about what it is produces real-world consequences that I have observed in courtrooms and in communication with investigating authorities for the duration of my career.

It is not a fingerprint comparison and should not be presented as one. Daktyloskopische comparison operates on stable anatomical structures with quantifiable feature counts that meet international identification standards. Facial morphological comparison operates on features that vary with lighting, angle, expression, age, and a dozen other factors that the practitioner must explicitly model and document. Applying fingerprint-style identification language to facial comparison overstates what the method can deliver and damages the credibility of both disciplines.

It is not an infallible procedure and should not be described as one. The 0.28 percent residual uncertainty embedded in the highest non-absolute Schwarzfischer predicate, the 472 hypothetical emergency calls per day at Frankfurt Airport, is not a rhetorical exaggeration; it is the actual epistemic content of “highly probable” when applied to a sufficiently large case volume. The court hearing such a predicate should understand both the strength of the evidence and the residual uncertainty it does not eliminate.

It is not a determination of guilt. A reliable identification of the accused as the person driving the vehicle at the time of the offense does not establish that the offense was committed, that the accused was the offender rather than a person operating the vehicle for the offender, or that the chain of factual inference from driver to defendant is otherwise unbroken. The identification is one input among many, and the expert who allows their opinion to be presented as more than that has stepped outside the boundary of their methodology.

It is not something that experience alone can substitute for method. I have examined cases alongside practitioners who were confident on the basis of experience and wrong on the basis of evidence. Confidence and accuracy are not the same variable. The field has spent considerable effort over the past 3 decades developing methods whose accuracy can be quantified within the limits of the underlying epistemology, precisely because quantified accuracy is what courts should demand and what the accused and the victims of crime deserve.

The Measure of a Method Is Its Honesty About Its Limits

Forensic facial comparison has been undermined over the decades not by the difficulty of the task but by practitioners who concealed that difficulty, who offered identifications where the evidence supported only consistency, and who failed to document and disclose the features that did not fit the expected conclusion. The discipline’s credibility in courts and before the public is directly proportional to the willingness of its practitioners to say, precisely and without apology, what the evidence does and does not support.

The Schwarzfischer scale has held its ground for more than 3 decades not because it generates the highest possible probability predicates but because it accommodates the full range of evidentiary strengths that the underlying method can deliver, including the inconclusive middle predicate that some practitioners avoid using because it disappoints commissioning parties. An examination culture that consistently uses the full range of the scale, including the inconclusive predicates when the evidence is inconclusive, produces a body of expert reports that courts can rely on. An examination culture that compresses the scale toward identification or exclusion because those are the predicates that commissioning parties want produces reports that wear out their welcome in the courtroom over time and contribute to the methodological skepticism that some forensic disciplines currently face.

The face on the speed camera photograph that arrived with 40 by 40 pixels was not identifiable. The correct report said precisely that in writing. The investigation did not receive the conclusion it sought. The accused was not identified from an image that could not identify anyone. This is not a failure of the examination; it is the examination functioning as it is supposed to function, which is to constrain the conclusions that the evidence permits rather than to extend them to the conclusions that the investigation requires.

Every face I have looked at carefully over more than 2 decades has been a calibration event for the neural instrument I use to look at faces. Every examination I have returned inconclusive has been a contribution to the credibility of every examination I have returned at “Identität höchst wahrscheinlich”, because the highest predicate means something specific only if the inconclusive also means something specific. The expert who never returns an inconclusive does not have better evidence than the expert who returns 20 percent inconclusive conclusions; they have a different relationship to the truth content of their verbal scale, which is a euphemism for a measurable quantity of professional dishonesty.

The 472 ambulances at Frankfurt Airport will, on a long enough timeline, find me. The question is not whether the highest probability predicate is occasionally wrong; the question is whether the report that delivers it acknowledges the residual uncertainty honestly enough that the court has the information it needs to weigh the evidence properly. The arithmetic itself does not lie about its own content. The practitioner who delivers that arithmetic sometimes does. Choosing not to is the entire profession.

References

  • AGIB. (2011). Standards für die Identifikation lebender Personen nach Bildern: Grundlagen, Kriterien und Verfahrensregeln für Gutachten, Fassung vom 16. Dezember 2011. Arbeitsgruppe für anthropologische Identifikation nach Bildern. https://bildidentifikation.de
  • Bate, S., Portch, E., & Mestry, N. (2021). When two fields collide: Identifying “super-recognisers” for neuropsychological and forensic face recognition research. Quarterly Journal of Experimental Psychology, 74(12), 2143-2160.
  • Bergold, A.N., & Kovera, M.B. (2025). The contribution of facial recognition technology to wrongful arrests and trauma. Psychological Trauma, 17(Suppl 1), S225-S233.
  • Bundesgerichtshof. (2005). Urteil vom 15. Februar 2005, 1 StR 91/04 (LG Memmingen). Anthropologisches Identitätsgutachten und Anforderungen an die Beweiswürdigung.
  • Dunn, J.D., Varela, V., Popovic, B., Summersby, S., Miellet, S., & White, D. (2025). Super-recognizers sample visual information of superior computational value for facial recognition. Proceedings of the Royal Society B, 292.
  • ENFSI. (2018). Best Practice Manual for Facial Image Comparison, ENFSI-BPM-DI-01, Version 01, January 2018. European Network of Forensic Science Institutes.
  • ENFSI. (2018). Best Practice Manual for Forensic Image and Video Enhancement, ENFSI-BPM-DI-02, Version 01, June 2018. European Network of Forensic Science Institutes.
  • FISWG. (2012). Guidelines for Facial Comparison Methods, Version 1.0. Facial Identification Scientific Working Group.
  • FISWG. (2021). Image Factors to Consider in Facial Image Comparison. Facial Identification Scientific Working Group.
  • Fraport AG. (2025). Verkehrszahlen 2024: Flughafen Frankfurt am Main. https://www.fraport.com
  • Gauthier, I., Tarr, M.J., Anderson, A.W., Skudlarski, P., & Gore, J.C. (1999). Activation of the middle fusiform ‘face area’ increases with expertise in recognizing novel objects. Nature Neuroscience, 2, 568-573.
  • Huckenbeck, W., & Gabriel, P. (2013). Identifikation lebender Personen auf Bildern. Rechtsmedizin, 23, 251-262. Springer Verlag.
  • Kanwisher, N., McDermott, J., & Chun, M.M. (1997). The fusiform face area: A module in human extrastriate cortex specialized for face perception. Journal of Neuroscience, 17(11), 4302-4311.
  • Knußmann, R. (1983). Die vergleichende morphologische Analyse als Identitätsnachweis. Strafverteidiger, 3, 127-129.
  • Knußmann, R. (Ed.). (1988). Anthropologie. Handbuch der vergleichenden Biologie des Menschen, Band I/1. Gustav Fischer Verlag, Stuttgart.
  • Knußmann, R. (1991). Zur Wahrscheinlichkeitsaussage im morphologischen Identitätsgutachten. Neue Zeitschrift für Strafrecht, 11(4), 175-177.
  • PMC8698381. (2021). Forensic facial comparison: Current status, limitations, and future directions. Biology, 10(12), 1269.
  • Reche, O. (1965). Eine neue Methode zur Erleichterung der Beweisführung in Identifizierungsprozessen. Homo, 16, 113-116.
  • Rösing, F.W. (2006). Identifikation von Personen auf Bildern. In: G. Widmaier (Ed.), Münchner Anwaltshandbuch Strafverteidigung. C.H. Beck, München.
  • Rösing, F.W. (2008). Morphologische Identifikation von Personen. In: J. Buck & H. Krumbholz (Eds.), Sachverständigenbeweis im Verkehrsrecht. Nomos-Verlag, Baden-Baden.
  • Russell, R., Duchaine, B., & Nakayama, K. (2009). Super-recognizers: People with extraordinary face recognition ability. Psychonomic Bulletin and Review, 16(2), 252-257.
  • Schwarzfischer, F. (1992). Identifizierung durch Vergleich von Körpermerkmalen, insbesondere anhand von Lichtbildern. In: E. Kube, O. Störzer, & J. Timm (Eds.), Kriminalistik. Handbuch für Praxis und Wissenschaft, Band I, S. 735-761. Boorberg Verlag, Stuttgart.
  • Stewart, C.K., & Kukucka, J. (2025). Cognitive bias affects perception and decision-making in simulated facial recognition searches. Behavioral Sciences, 15(8), 1094.