What Bone Knows That Documents Don’t
On the 99.63 percent accuracy no one talks about, why a pelvis outperforms an entire filing cabinet of paperwork, and what happens to an identification when the practitioner ignores the hierarchy of skeletal evidence in favor of the method they learned 20 years ago.
The pelvis was on the examination table when the detective asked what he always asks first. Not cause of death, not identity, not time since death. He asked: man or woman? He asked it the way people ask questions they believe have simple answers, with the half-impatient tone of someone who expects a single word and plans to write it into a form.
I gave him the single word. Then I spent the next 20 minutes explaining why I was certain, and by what margin, and what evidence had produced the certainty, and what would have happened to that certainty if we had been looking at a skull alone rather than a complete innominate. He stopped writing and started listening approximately 4 minutes in, which is faster than most detectives manage. By the time I reached the posterior probability output of the DSP2 calculation, he had set the form face-down on the table and was looking at me the way people look when they realize a subject they thought was intuition is actually arithmetic.
This is the fundamental public relations problem of skeletal sex estimation: it looks like an art, it is practiced in many institutions as though it were an art, and it is, in the hands of someone who has done it long enough and carefully enough, substantially more reliable than an art. The difference between art and arithmetic matters acutely in a courtroom, where an opposing expert who understands the method can dismantle an opinion that is not grounded in quantifiable evidence, and where a family waiting for the identification of remains deserves the most accurate answer the discipline can produce rather than the most confident-sounding one.
Why the Biological Profile Begins with Sex

The biological profile is the set of estimates that a forensic anthropologist derives from skeletal remains to characterize an unknown individual: sex, age, stature, and ancestry. These 4 parameters, combined, define the search space within which a missing persons investigation operates. They do not identify anyone; they narrow the list of people who could be the individual in question to a manageable number of candidates who can then be matched through dental records, DNA, or other identifying evidence.
Of the 4 parameters, sex estimation is the first because it is the one that halves the search space in a single step. A female skeleton estimated to be between 35 and 50 years of age, between 158 and 165 centimeters tall, of European ancestry, eliminates roughly half of all missing persons records from consideration before any other analysis begins. An error at this step is not a minor inaccuracy that subsequent estimation can compensate for: a female skeleton entered as male will be searched against the wrong half of the database regardless of how accurately everything else is estimated, and the correct identification will not be found.
This is not a theoretical concern. The forensic literature contains documented cases in which sex misclassification, typically a female skeleton assigned male based on cranial robusticity without pelvic analysis, delayed identification by months or years. The delay is not abstractly unfortunate; it is a concrete failure with concrete consequences for investigations and for the families of the dead. It is also, given what the field currently knows about the accuracy of different methods applied to different skeletal elements, an avoidable failure.
The Biology of Dimorphism: Where the Difference Lives and Why
Sexual dimorphism in the human skeleton has 2 distinct biological origins, and understanding their difference explains why the pelvis and the cranium perform so differently as sex estimation tools.
The first origin is hormonal. From puberty onward, androgens and estrogens exert differential effects on bone growth, density, and morphology. Males, under the influence of higher androgen concentrations, develop greater bone robusticity, larger muscle attachment areas, more pronounced cranial superstructures, and larger overall skeletal dimensions on average. Females develop somewhat different trabecular bone architecture under estrogen influence, with implications for bone density and fracture patterns that are well documented in clinical medicine and useful in forensic analysis as secondary evidence. This hormonal dimorphism is real, statistically significant in population samples, and reflected in every bone of the adult skeleton. It is also variable: some females are more robust than some males, the overlap between distributions is substantial in many measurements, and the magnitude of the difference varies substantially between populations of different average body composition.
The second origin is functional. Female pelves develop specific structural adaptations for childbirth that produce architectural differences from male pelves that are not primarily a matter of size or robusticity but of shape, angle, and proportion. The subpubic angle is wider in females to accommodate the fetal head during delivery. The pelvic inlet is shaped differently, oval in females rather than heart-shaped in males, for the same reason. The ilium is more flared, the sacrum shorter and broader, the sciatic notch wider. These differences are not random expressions of hormonal variation; they are functional adaptations under direct selection pressure across the entire evolutionary history of obligate bipedal birth in Homo sapiens, which is a particularly difficult delivery problem compared to most other primates because bipedalism has both enlarged the fetal head and narrowed the birth canal.
The consequence of this distinction is profound for forensic practice. Hormonal dimorphism, which drives cranial and postcranial differences, is population-specific in its magnitude and variable at the individual level. Functional dimorphism in the pelvis is largely population-independent in its direction and magnitude, because the selection pressure that produced it, successful childbirth in bipedal hominins, operated similarly across all human populations. This is why the pelvis is the most reliable skeletal indicator of sex and why no amount of cranial expertise fully substitutes for pelvic analysis when the pelvis is available.
Before puberty, neither source of dimorphism has expressed itself in the morphological features that the most accurate methods read. Juvenile skeletons show minimal sex-specific differences in the features that adult analysis relies upon, which means that juvenile remains are, in practical forensic terms, unsexable by the morphological methods that work well in adults. This limitation is not a gap that experience fills; it is a genuine boundary of the method.
The Pelvis: The Number Is Not a Rounding Error
Murail and colleagues introduced the Diagnose Sexuelle Probabiliste method in 2005. The method applies 10 linear measurements of the innominate, the hip bone, to a reference database assembled from 2,040 skeletal specimens representing the worldwide range of human variation, and computes via Fisher’s linear discriminant analysis a posterior probability of sex assignment for each specimen analyzed (Murail, P., Brůžek, J., Houët, F., & Cunha, E., 2005, “DSP: A tool for probabilistic sex diagnosis using worldwide variability in hip-bone measurements”, Bulletins et Mémoires de la Société d’Anthropologie de Paris, 17(3-4), 167-176).
The accuracy of this method in its validation sample was 99.63 percent. DSP2, the 2017 revision by Brůžek and colleagues, refined the measurement protocol and updated the reference database, achieving comparable or slightly improved accuracy across validation studies in European, South American, Greek, and other populations (Brůžek, J., et al., 2017, “Reliability and validity of the sex diagnosis by DSP software”, Forensic Science International, 281, 207.e1-207.e7). A 2025 application and validation study in a Spanish sample of 303 hip bones found overall accuracy higher than 95 percent with DSP2 applicability above 85 percent across most measurements, consistent with prior validation work (PMC11850468, International Journal of Legal Medicine, 2025).
The 99.63 percent figure is sometimes met with skepticism by people unfamiliar with the method, which is understandable: it is a very high number. It is not, however, an artifact of a particularly favorable test sample or an unusual population. It reflects the fact that pelvic sexual dimorphism is not population-specific in any significant sense, which is the insight that makes the worldwide reference database valid rather than merely aspirational.
The practical limitation of DSP2 is bone availability. Forensic anthropologists do not choose their cases; they work with what arrives. The innominate is frequently damaged, missing, or fragmented in remains that have been exposed to trauma, taphonomic processes, or deliberate dismemberment. DSP2 requires a minimum of 4 of its 10 measurements to produce a probability output, which provides meaningful flexibility but cannot always be satisfied with highly fragmented material. In these cases, the analyst works down the accuracy hierarchy to the next available reliable evidence.
The Phenice method, published in 1969, uses 3 morphological features of the ventral pubic surface, the ventral arc, the subpubic concavity, and the medial aspect of the ischiopubic ramus, to estimate sex from the pubic bone. In its original form, the method achieved approximately 96 percent accuracy. The Klales revision of 2012 converted each trait to an ordinal scoring system and applied logistic regression to produce probability estimates, improving both accuracy and the standardization of inter-observer application (Klales, A.R., et al., 2012, “A revised method of sexing the human innominate using Phenice’s nonmetric traits and statistical methods”, American Journal of Physical Anthropology, 149(1), 104-114). The pubic bone is among the most sexually dimorphic elements in the skeleton and, being smaller than the full innominate, often survives fragmentation better than the ilium, making Phenice-derived morphological assessment a useful complement to DSP2 metrics.
The combination of DSP2 metric analysis and Klales morphological scoring, applied together to the available pelvic material, represents the current best practice for pelvic sex estimation and produces results that are defensible in cross-examination because they are explicit, reproducible, and tied to a documented accuracy record.
The Cranium: Honest About Its Limits
The skull alone, assessed by a trained practitioner using standardized morphological protocols, achieves approximately 80 percent accuracy for sex estimation of adult remains. This figure comes from the foundational forensic anthropology literature and represents a genuine empirical finding rather than a conservative estimate: cranial dimorphism, while real and visually salient to an experienced observer, is variable enough at the individual level and population-specific enough in its expression that a substantial fraction of specimens falls in the ambiguous range where morphological assessment cannot distinguish reliably.
The relevant cranial features are well characterized. The supraorbital torus, the protrusion of bone above the orbits, is more pronounced in males because the temporalis muscle, which is larger in males, attaches to the temporal fossa above and creates mechanical demands that stimulate greater bone deposition along the orbital margin. The glabella, the midline point of the supraorbital region, projects more anteriorly in males. The mastoid process, the bony protrusion behind the ear that anchors the sternocleidomastoid muscle, is generally larger in males, reflecting greater neck musculature. The external occipital protuberance and the nuchal lines on the posterior cranium are more pronounced in males for the same reason: larger muscles require larger attachment areas, and the bone responds to mechanical loading by becoming more robust in the relevant regions.
The frontal bone is more vertical in females, creating a more globular forehead, and more reclined in males. Orbital shapes are generally more rectangular in males and more rounded in females. The zygomatic arches are more flared in males. The mandible is larger and more angular, with a broader, squarer chin, in males, while female mandibles tend toward a narrower, more pointed symphyseal region and a smoother gonial angle.
An experienced observer integrating all of these features produces a gestalt assessment that achieves 80 percent accuracy in population studies. The Walker method, published in 2008, improved on this by applying ordinal scoring to 5 of the most sexually dimorphic cranial features and analyzing the resulting scores with discriminant function analysis, achieving approximately 89 percent accuracy in validation (Walker, P.L., 2008, “Sexing skulls using discriminant function analysis of visually assessed traits”, American Journal of Physical Anthropology, 136(1), 39-50). The improvement over unstructured assessment reflects what standardization always produces: reduced inter-observer variability and a forcing function that prevents the observer from unconsciously overweighting features that happen to be particularly salient in a specific case.
The mastoid process alone, when measured with standard calipers and analyzed with either traditional discriminant functions or geometric morphometric approaches, achieves approximately 87 percent accuracy in Indian and American white population samples (Saini et al., cited in Christensen et al., 2014, Forensic Anthropology, Academic Press). This is a practically important finding because the mastoid process is one of the most taphonomically robust features of the skull, frequently surviving when the orbits, the frontal bone, and the facial skeleton have been damaged.
The 3-dimensional cranial imaging literature, reviewed comprehensively in a 2024 scoping study covering 73 papers published between January 2020 and February 2024, documents consistent accuracy rates of 90 to 95 percent for CT-based geometric morphometric and machine learning approaches to cranial sex estimation (Wu et al., 2024, PMC11627412, International Journal of Legal Medicine). These approaches are not yet standard in forensic casework, but the research trajectory is clear: computational analysis of 3D cranial shape routinely outperforms conventional morphological assessment by 5 to 10 percentage points when imaging is available.
None of this changes the fundamental position of the cranium in the accuracy hierarchy. When both pelvis and cranium are available, the pelvis drives the sex estimate and the cranium is confirmatory. When only the cranium is available, the Walker morphological protocol combined with available metric measurements produces the most defensible estimate, accompanied by explicit acknowledgment that the accuracy is approximately 89 percent and that the remaining uncertainty should be reflected in the probability statement of the forensic report.
Long Bones and Fragments: When the Rest Is Missing
When neither pelvis nor cranium is recoverable, the long bones, primarily the femur, humerus, tibia, and radius, provide metric evidence for sex estimation through discriminant function analysis applied to maximum length, head diameter, and midshaft circumference. Male long bones are, in population terms, longer, heavier, and have larger articular surfaces than female long bones, reflecting both the greater average body size of males and the greater bone-stimulating effect of male muscle mass. The femoral head diameter is among the most reliable single measurements, with threshold values separating male and female distributions showing accuracy in the 85 to 90 percent range in population studies when the specimen falls clearly on one side of the threshold.
FORDISC 3, the discriminant function software developed by Jantz and Ousley at the University of Tennessee, applies these principles to a reference database assembled from the Forensic Anthropology Data Bank (Jantz, R.L., & Ousley, S.D., 2005, FORDISC 3: Personal Computer Forensic Discriminate Functions, University of Tennessee). For each set of measurements entered, FORDISC produces a sex classification with an associated posterior probability, allowing the practitioner to express the estimate as “male with 91 percent posterior probability” rather than simply “male.” This probabilistic output is the appropriate form for forensic expert testimony because it conveys both the conclusion and the uncertainty, which is what evidentiary reasoning demands.
FORDISC’s limitation is the composition of its reference database, which is weighted toward North American populations. Applying FORDISC’s North American reference group to individuals from populations with substantially different average body dimensions can produce misclassifications that would not occur if an appropriate population-specific reference were available. FORDISC includes reference groups for several populations beyond North American whites and blacks, but the coverage is uneven, and the analyst must make a prior judgment about population affinity before selecting the appropriate comparison group.
For severely fragmented or incomplete specimens, including cremated remains where long bones are reduced to small calcined fragments, sex estimation becomes a probabilistic exercise with accuracy rates that reflect the information content of whatever is available. In these cases, the forensic report should present the available evidence transparently, state the accuracy associated with each method applied, and give an overall probability estimate that integrates all available evidence rather than a falsely decisive “male” or “female.”
What 118 Practitioners Actually Do in 2025
The surveys that periodically assess forensic anthropological practice reveal gaps between the methodological literature and the practice literature that are worth understanding. A 2025 survey study by Klales, reaching 118 practitioners of skeletal sex estimation in the United States, provides the most current systematic picture of actual practice (Klales, A.R., 2025, “Reevaluating skeletal sex estimation practices in forensic anthropology”, Journal of Forensic Sciences, 70, 825-834).
The findings are partly reassuring and partly instructive about the work still to be done. The reassuring part: 99.0 percent of respondents reported using both qualitative and quantitative approaches in combination, which is the correct practice. The pelvis was the preferred element for morphological approaches, reflecting appropriate prioritization of the most accurate indicator. FORDISC 3 was the preferred metric tool, reflecting the US-centric training environment in which most respondents had been educated.
The instructive part concerns how final sex estimates are determined from combined evidence. Among respondents who used multiple methods, 36.1 percent gave preference to the pelvis when the evidence converged, which is correct. But 39.2 percent reported basing the final estimate on experience after reviewing all methods, which is a description of integrating evidence through expert judgment rather than through a formal decision rule. In edge cases, in cases where the evidence is not cleanly convergent, in cases where an unusual specimen creates uncertainty that experience alone cannot resolve, the absence of a formal decision rule introduces variability between practitioners that is not visible in the final report and that cannot be tested or challenged in cross-examination.
The survey also revealed that DSP2, despite its documented superiority in accuracy across multiple validation populations, was less commonly incorporated into US forensic practice than FORDISC 3. What is best practice in the research literature does not automatically become best practice in the courtroom, and the gap between them is where identifications fail.
Population Specificity and the Ancestry Problem
The interaction between sex estimation and ancestry estimation in the biological profile is structurally awkward in a way that most introductory accounts avoid mentioning. The problem is this: some sex estimation methods, particularly discriminant functions applied to long bone dimensions, require knowledge of population affinity to select the appropriate reference group, which means that sex estimation accuracy partially depends on ancestry estimation accuracy. If ancestry is estimated incorrectly, the sex estimate produced using population-specific norms may also be incorrect.
The magnitude of this problem varies by method. DSP2, with its worldwide reference database, is designed to be population-independent in its sex discrimination because pelvic sexual dimorphism itself is functionally driven and therefore largely population-independent in direction and magnitude. Using DSP2 correctly does not require prior ancestry estimation. FORDISC 3, when applied to skeletal dimensions that vary substantially between populations, does require the selection of a reference group, and selecting the wrong one because ancestry has been misestimated can propagate the error into the sex estimate.
For practitioners in European jurisdictions, including Germany, where the population of individuals potentially represented in forensic skeletal collections is demographically diverse, the practical implication is to prioritize methods that are population-independent where those methods are applicable, specifically DSP2 for the pelvis, and to explicitly acknowledge in every forensic report the population affinity uncertainty and its potential effect on the accuracy of population-specific estimates. A female pelvis assessed with DSP2 carries a near-certain sex assignment regardless of whether the individual was born in Munich or Mogadishu. A femoral head diameter assessed against the North American white reference group in FORDISC carries an ancestry-dependent error that should be visible in the report.
Polemical Warning
I want to name clearly what forensic sex estimation is not, because the confusion about what it is produces real-world consequences that I have observed in courtrooms and in communication with investigating authorities.
It is not a determination of gender identity. The biological profile estimates biological sex, meaning the hormonal and functional sex-related biology that expresses itself in skeletal architecture. It does not address, and cannot address, how the individual identified with respect to gender, what pronouns they preferred, or what social role they occupied. This distinction is not a concession to current social debates; it is a description of what the evidence supports. The pelvis does not know anything about self-identification. It knows about childbirth biomechanics and hormonal development, and it reports accordingly with 99.63 percent accuracy.
It is not infallible. The 99.63 percent figure for DSP2 pelvic assessment means that approximately 3.7 of every 1,000 assessments will be incorrect. In a field where individual identifications matter, this irreducible error rate should be acknowledged in every forensic report rather than obscured by language that implies certainty. Expert testimony that states “the remains are female” is epistemically different from testimony that states “the pelvic metrics are consistent with a female individual with a posterior probability of 0.996,” and the difference matters for the legal weight appropriately assigned to the evidence.
It is not something that experience alone can substitute for method. I have examined remains alongside practitioners who were confident on the basis of experience and wrong on the basis of evidence. Confidence and accuracy are not the same variable. The field has spent considerable effort over the past 30 years developing methods whose accuracy can be quantified precisely because quantified accuracy is what courts should demand and what the families of the dead deserve.
The Evidence Hierarchy Is Not a Suggestion
The forensic anthropology of sex estimation has produced, over the past 50 years, a remarkably clear hierarchy of evidence quality. The pelvis, assessed with DSP2 and supplemented by Klales morphological scoring of the pubic region, provides the most accurate evidence available from bone and should be the primary determinant of the sex estimate when it is available in sufficient condition. The cranium, assessed with the Walker morphological scoring protocol and supplemented by mastoid process metrics, provides secondary evidence at approximately 89 percent accuracy that should confirm but not override the pelvic assessment. The long bones provide metric evidence in the 80 to 90 percent range that contributes to the probability estimate when the primary elements are unavailable. Technological development in 3D imaging and machine learning is extending the accuracy ceiling across all elements and will increasingly be incorporated into routine casework as validation work matures.
Following this hierarchy is not optional for a practitioner who intends their work to withstand scientific scrutiny. Reporting a sex estimate based on cranial morphology alone when the pelvis was available but not examined is a methodological failure that an opposing expert can identify and articulate to a court. Reporting a sex estimate without a posterior probability, or with a probability derived from an inappropriate reference population, is an accuracy failure that may not be identified in the immediate case but that systematically produces incorrect identifications at a predictable rate.
The detective who set down his form and listened was unusual in his willingness to engage with the arithmetic. Most people, including many within the legal system, prefer the single word to the probability. The practitioner’s job is to give both: the word, and the number behind it, and the method that produced the number, and the population from which the method was validated. The bones remember their biology with the precision of physics and chemistry. The report should honor that precision rather than translate it into false certainty.
References
- Brůžek, J., et al. (2017). Reliability and validity of the sex diagnosis by the Diagnose Sexuelle Probabiliste (DSP) software using the Coimbra Identified Skeletal Collection. Forensic Science International, 281, 207.e1-207.e7.
- Christensen, A.M., Passalacqua, N.V., & Bartelink, E.J. (2014). Forensic Anthropology: Current Methods and Practice. Academic Press.
- Jantz, R.L., & Ousley, S.D. (2005). FORDISC 3: Personal Computer Forensic Discriminate Functions. University of Tennessee, Knoxville.
- Klales, A.R. (2025). Reevaluating skeletal sex estimation practices in forensic anthropology. Journal of Forensic Sciences, 70, 825-834. https://doi.org/10.1111/1556-4029.70014
- Klales, A.R., et al. (2012). A revised method of sexing the human innominate using Phenice’s nonmetric traits and statistical methods. American Journal of Physical Anthropology, 149(1), 104-114.
- Murail, P., Brůžek, J., Houët, F., & Cunha, E. (2005). DSP: A tool for probabilistic sex diagnosis using worldwide variability in hip-bone measurements. Bulletins et Mémoires de la Société d’Anthropologie de Paris, 17(3-4), 167-176.
- Ousley, S.D., & Jantz, R.L. (2012). Fordisc 3 and statistical methods for estimating sex and ancestry. In D.C. Dirkmaat (Ed.), A Companion to Forensic Anthropology (pp. 311-329). Wiley-Blackwell.
- Phenice, T.W. (1969). A newly developed visual method of sexing the os pubis. American Journal of Physical Anthropology, 30(2), 297-301.
- PMC11850468. (2025). Application of DSP2 for biological sex estimation in a Spanish sample. International Journal of Legal Medicine. Springer Nature.
- Walker, P.L. (2008). Sexing skulls using discriminant function analysis of visually assessed traits. American Journal of Physical Anthropology, 136(1), 39-50.
- Wu, C., et al. (2024). Sex estimation techniques based on skulls in forensic anthropology: A scoping review. PMC11627412. International Journal of Legal Medicine.