Article

Identification of Individuals in Video Footage Based on Body Structure and Movement Patterns

Apr 5, 2023 | 19 min | anthropology

Language

EN DE

Bank robbery captured on overhead CCTV footage, masked suspect at the counter

There is a type of expert assignment that most people find surprising when I describe it. Not the gruesome ones, not the cases involving decomposed bodies or disputed forensic findings in murder trials. What consistently surprises people is this: in a significant share of my caseload, the central question is not what happened, but who was there when it happened. And the answer has to come from a surveillance camera image that, in the worst case, was recorded at 4 frames per second, with a lens dirty enough to suggest the building owner has not cleaned it since 2011, from a camera angle pointed 37 degrees too high to show anything useful below the waist, in lighting conditions that would make a 1980s Soviet documentary look cinematographically ambitious.

That is the real world of forensic person identification from video. Not the clean lab conditions in which most academic research on gait recognition is conducted. Not the calibrated multi-camera setups, the controlled lighting, the cooperative subjects walking precisely on a marked path in the direction the algorithm expects. The footage I work with arrives on a USB stick or through a police file transfer, and within 30 seconds of opening it I can usually tell whether it is going to be useful or whether the next several hours will be an exercise in disciplined frustration.

Both outcomes can be forensically productive, which is the part that surprises people most.

What the Method Actually Rests On

Person identification from video is built on a fundamental biological fact. Every individual moves differently. The difference is not random, and it is not subtle in the statistical sense: it is rooted in the skeletal structure, the musculature, the neurological patterning of movement that develops over decades of use, injury, compensation, and habit. Gait is not a costume someone can take off. It is, within limits, as individual as a fingerprint, and unlike a fingerprint it is observable from a distance, without the subject’s awareness, and without any biometric sensor other than a camera.

The forensic examiner exploits this individuality along several parallel dimensions. Body structure, in the anthropometric sense, refers to the measurable proportions of the skeleton: the ratio of trunk height to leg length, shoulder breadth, the relative lengths of the upper and lower arm, the width of the pelvis, and dozens of other features that fall outside the voluntary control of the person being observed. These are not measurements one can consciously alter while walking past a security camera. They are what they are, and in a population of reasonable size they narrow down the candidate pool considerably even before movement is analyzed.

The anthropometric approach has a long history in forensic identification, predating video surveillance by well over a century. Bertillon’s identification system from the 1880s was built on body measurements, and while Bertillon himself overestimated the uniqueness of his specific measurement battery, the underlying insight that skeletal proportions individualize a person has never been seriously challenged. What has changed is the ability to extract those proportions from footage rather than requiring the subject’s physical presence and cooperation.

Movement patterns layer an additional level of individuality on top of the structural features. The gait cycle, the sequence of stance phase, swing phase, and double support, is not the same from person to person. Step length, cadence, maximum step height, pelvic rotation, the characteristic arc of arm swing, the degree of out-toeing or in-toeing, the pattern of weight transfer across the foot from heel strike to toe-off: each of these parameters varies, and they vary in ways that are correlated with an individual’s skeletal structure, muscular strength patterns, and neurological history. A person who walked through a specific posture change at age 35 due to a lumbar injury has a gait signature that is not identical to anything produced before that injury. That signature persists because the nervous system adapts and then conserves the adapted pattern.

Biomechanics provides the theoretical framework for understanding these patterns mathematically. The physics of human locomotion involves the interaction of ground reaction forces, joint torques, muscular activation sequences, and the kinematic chain from ankle to shoulder that transmits force through the body in a characteristic sequence for each individual. The mathematical models that biomechanics uses to quantify these interactions are, in principle, a precision instrument. In practice, they are only as precise as the input data allows, and the input data in a forensic context is almost never clean.

The Gap Between Laboratory Research and Forensic Reality

Academic gait recognition research has, over the past 3 decades, produced an impressive technical literature. Sarkar and colleagues established the HumanID Gait Challenge in 2005, providing a benchmark dataset and a set of standardized experimental conditions that allowed systematic performance comparison across algorithms (Sarkar, S., et al., 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(2), 162-177). Han and Bhanu’s 2006 work on Gait Energy Images demonstrated that silhouette averaging could produce computationally tractable gait representations with strong discriminative power under controlled conditions (Han, J., & Bhanu, B., 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(2), 316-322). The surveys by Aggarwal and Cai (1999) and Moeslund, Hilton, and Krüger (2006) charted the theoretical landscape in ways that remain foundational.

The gap between these results and the conditions I encounter in practice is not a criticism of the research. It is a structural feature of the field that practitioners have to internalize and account for explicitly in every case.

The HumanID Gait Challenge dataset was recorded at 30 frames per second. Most surveillance footage I receive runs at between 6 and 15 frames per second. The challenge dataset used high-quality cameras with known specifications. Surveillance cameras in German grocery stores, parking garages, and the kinds of locations where crimes occur are often 5 to 10 years past their replacement date, calibrated at installation and never touched again, running on recorder systems that apply additional compression artifacts to save storage space. The challenge dataset had subjects walking in a controlled path at a known distance from the camera. The footage I analyze has suspects who may have passed the camera at an angle, partially occluded by other pedestrians, at variable distances, sometimes partially concealed by clothing chosen specifically to reduce identifiability.

None of this makes the analysis impossible. It changes what kinds of conclusions are defensible.

What I Actually Do When the File Arrives

The first step in any analysis is an honest technical assessment of the footage. Before I look at the suspect at all, I document the technical parameters of the recording: frame rate, resolution, compression codec and quality setting, camera height and estimated angle, lighting conditions, and any sources of distortion. This is not bureaucratic box-ticking. It determines what conclusions the footage is capable of supporting.

A recording at 6 frames per second does not allow reliable analysis of gait cycle duration, because at that frame rate several phases of the cycle fall between frames entirely. A camera mounted too high and angled steeply downward collapses leg length, making anthropometric height and proportion estimates unreliable. A severely compressed recording with large block artifacts in the clothing region makes silhouette extraction impractical. Any of these limitations, if not documented and reported, produce conclusions that look more certain than they are. My obligation to the court is not to provide the most helpful answer. It is to provide the most accurate one.

Once the quality assessment is done and the technical limitations are established, the actual analysis proceeds along 2 tracks simultaneously: the structural track and the kinematic track.

On the structural track, I extract, where the image quality permits, the visible body proportions of the individual in the footage. Shoulder breadth. The trunk-to-leg-length ratio. Pelvic width as estimated from gait-induced lateral movement. Where measurement is impossible due to image quality or clothing, I note the observation qualitatively: the visible proportions are consistent with or inconsistent with the reference individual. No measurement I cannot defend is reported.

On the kinematic track, I analyze the movement pattern frame by frame. Step length, estimated from the position of the feet relative to identifiable ground features. Arm swing amplitude and symmetry. Pelvic rotation, visible in the shoulder-line movement and the characteristic side-to-side weighting during single-leg support. Trunk lean in the forward-backward and lateral directions. Any asymmetry, whether postural or kinematic, is documented because asymmetry is diagnostically powerful: a person who carries their left shoulder slightly lower than their right does so for a reason, and that reason persists across time.

The comparison with reference material is then conducted by overlaying the features extracted from the surveillance footage against the features extracted from reference recordings of the known individual. The comparison is not categorical. I do not conclude that person A is definitively person B. I conclude that the comparison reveals feature correspondences consistent with common origin, or feature divergences that make common origin unlikely, or, frequently, that the material does not permit a conclusion in either direction, in which case I say exactly that.

Expert Visual Assessment and Its Necessary Discipline

Alongside the quantitative analysis runs the component that generates the most discomfort in academic treatments of forensic biometrics: expert visual assessment. This is the experienced examiner looking at footage and drawing conclusions from a holistic reading of movement patterns that cannot always be reduced to numerical parameters, because the footage quality does not permit numerical extraction, but which nonetheless carry information.

The discomfort is understandable. Visual assessment is subjective in the sense that it depends on the observer’s experience and training. Different examiners can reach different conclusions from the same footage. These are real limitations. The forensic community has responded to them through the development of structured assessment protocols that impose discipline on the process: the examiner documents which specific features are being observed, why those features are considered relevant, and what conclusion each feature supports independently before a holistic judgment is formed. This structured approach significantly reduces, though it does not eliminate, the risk of anchoring bias, confirmation bias, and other cognitive errors that expert visual assessment is vulnerable to.

What expert visual assessment provides that algorithmic methods cannot yet reliably replace is the ability to process footage that is simply too degraded for quantitative methods. A good forensic examiner looking at 8 frames of a person walking through a parking structure can sometimes identify a characteristic shoulder drop or arm swing pattern that no algorithm would flag, because no algorithm has been trained on footage of this quality under these conditions with this camera angle. The expert is applying a learned prior, built from years of comparing thousands of movement patterns, that generalizes differently from an algorithmic model. That does not make it infallible. It makes it the best available instrument for a specific set of difficult conditions.

The 2 approaches are therefore complementary, not competing. In cases where footage quality permits quantitative analysis, that analysis provides the primary evidential foundation. In cases where footage quality does not permit it, structured expert visual assessment provides what it can, with the uncertainty explicitly stated. In the best cases, both approaches converge on the same conclusion, which strengthens the result.

The Evidentiary Standard in German Courts

Forensic movement analysis is admissible as expert evidence in German criminal proceedings under the standard that applies to all expert opinion: the expert must be qualified, the methodology must be scientifically established, and the conclusions must be transparent and reproducible. The challenge specific to this field is the reproducibility requirement, because visual assessment conclusions are not automatically reproducible in the way that a chemical test result is.

German courts, in my experience, handle this well when the expert is clear about what the analysis rests on. The problem arises when expert conclusions are overstated, when a probabilistic finding is presented as categorical, or when the evidential limitations of the footage are not explicitly communicated. In those cases, the defense, quite rightly, attacks the foundation of the opinion, and the court has to make a credibility judgment without the technical background to fully evaluate the competing claims.

My approach in every case is to deliver the thinnest conclusion that the evidence supports, stated with exactly the degree of certainty the evidence warrants, accompanied by a complete description of what limitations prevent me from going further. This is sometimes frustrating for the investigators who commissioned the analysis. They want a yes or no. The evidence, in many cases, supports a qualified answer rather than an absolute one, and delivering a qualified answer is the only forensically honest thing to do.

Bouchrika and colleagues provided the most directly relevant empirical foundation for forensic gait comparison in their 2011 study using real CCTV footage recorded at Gatwick International Airport and validated against known individuals, demonstrating that joint-kinematics-based comparison can yield statistically meaningful results even under real-world surveillance conditions (Bouchrika, I., Goffredo, M., Carter, J., & Nixon, M. S., 2011, Journal of Forensic Sciences, 56(4), 882-889). This is exactly the kind of research the field needs: not laboratory accuracy benchmarks, but empirical validation of forensic methods under conditions that actually resemble the footage practitioners work with.

The Problem of Context-Driven Variation

Movement patterns are stable within a person over time, but they are not fixed under all conditions. This is one of the factors that introduces genuine complexity into forensic comparison and that distinguishes the careful analyst from the careless one.

A person who walks normally produces a certain gait signature. The same person walking quickly produces a different one: step length increases, cadence increases, arm swing amplitude increases, the gait cycle becomes more efficient. A person who is carrying a load produces a modified signature: trunk lean shifts forward, center of mass drops slightly, the compensatory arm swing is reduced. A person who is self-conscious about being observed, which suspects often are near security cameras, may alter their gait consciously or unconsciously. A person walking on ice versus dry pavement produces characteristically different movement patterns. Footwear choice affects gait kinematics measurably.

The forensic examiner has to account for all of these sources of variation when conducting a comparison. The reference footage must be, to the degree possible, matched to the surveillance footage in terms of walking speed, load, terrain, and footwear type. Where these cannot be matched, the differences have to be reported and their effect on the comparison has to be assessed explicitly.

Komar and Buikstra’s foundational work on forensic anthropology methodology establishes the epistemological framework for this kind of uncertainty management: the expert’s obligation is to the accuracy of the analysis, not to the outcome of the case, and where uncertainty exists, it is to be reported, not suppressed (Komar, D. A., & Buikstra, J. E., 2008, Forensic Anthropology: Contemporary Theory and Practice, Oxford University Press).

Where the Field Is Heading

The integration of algorithmic methods into forensic gait analysis is proceeding rapidly, and the trajectory is clear: machine learning approaches will eventually outperform human experts under conditions of adequate footage quality. The empirical evidence for this in controlled settings is already substantial, and it will only grow stronger as training datasets become larger and more diverse.

What will not be automated, in the foreseeable future, is the judgment layer that surrounds the algorithmic result: the assessment of footage quality that determines whether an algorithmic result is meaningful, the structured expert evaluation for cases where quality is too low for algorithmic processing, the translation of technical findings into language and framing that a court can evaluate, and the ethical discipline of delivering honest uncertainty rather than confident conclusions that the evidence does not support.

Ross and Jain’s work on information fusion in biometrics established that combining multiple biometric modalities and multiple analysis methods consistently outperforms any single approach (Ross, A., & Jain, A. K., 2003, Pattern Recognition Letters, 24(13), 2115-2125). This finding generalizes naturally to the forensic context: the most robust analyses combine structural comparison, kinematic analysis, expert visual assessment, and, where available, algorithmic output, each weighted by its reliability under the specific footage conditions of the case.

The surveillance camera density in German cities has increased dramatically over the past decade, and continues to increase. More cameras means more footage reaching expert desks, and the cases are not getting simpler. Footage from modern high-resolution IP cameras in well-maintained public spaces is genuinely amenable to the full range of analytic techniques. Footage from legacy analog systems in rural service stations is not, and the courts will continue to need experts who can make that distinction honestly and communicate it clearly.

I have been making that distinction, in one form or another, for over 2 decades. The footage quality has improved substantially on average. The complexity of the cases has kept pace with the improvement in footage quality. The fundamental challenge, making reliable comparative conclusions from imperfect evidence under the evidentiary standards of a criminal proceeding, has not changed.

What This Means in Practice

Person identification from video is not a binary procedure. It is not a scanner that produces a red light or a green light. It is an analytical framework that extracts comparative information from imperfect material, quantifies the uncertainty in that extraction, and delivers a conclusion calibrated to what the material actually supports.

Anyone who promises more than this is not doing forensic work. They are performing it. The difference matters when the conclusion is used to put someone in prison.

I work within the limits that the evidence imposes. Where those limits allow a strong conclusion, I state it strongly. Where they do not, I state exactly how far the evidence takes me and no further. The court, not the expert, decides what to do with the result. That division of responsibility is the architecture of the system, and it is the right one.

What Distinguishes This From Face Recognition

A question I receive regularly, from lawyers, from journalists, and from clients who are trying to understand what they are paying for, is how forensic gait and body structure analysis differs from the facial recognition that has become a standard reference point in public discussions of biometric identification. The answer matters, because conflating the 2 can produce badly miscalibrated expectations in both directions.

Facial recognition, as implemented in the current generation of deep learning systems, is enormously more accurate than gait recognition under conditions of adequate image quality. A high-resolution frontal or near-frontal face image of sufficient size, captured under reasonable lighting, allows recognition performance that approaches the theoretical limit in controlled conditions. This accuracy collapses rapidly as image quality degrades, as the face is seen from unfavorable angles, or as it is occluded by hats, scarves, glasses, or other coverings. The face, in other words, is simultaneously the most individually discriminating biometric feature and the one most easily concealed or distorted by the subject.

Gait and body structure operate under different constraints. No one walks through a parking structure with their gait pattern covered. No one can wear a scarf that changes their step length or a hat that modifies their pelvic rotation. Concealment of kinematic features is possible, through deliberate gait alteration, but it is cognitively demanding and difficult to maintain consistently over the duration of a surveillance recording. A person deliberately altering their gait typically produces a pattern that is itself distinctive: the controlled, slightly artificial cadence of someone concentrating on walking differently rather than simply walking.

The practical implication is that gait and body structure analysis is most useful precisely in the cases where facial recognition fails: footage where the face is never visible, where the subject is too distant for facial features to be resolved, or where the face is deliberately concealed. These cases are common in real criminal investigations. The subject who covers their face has made a decision that confirms their awareness of camera surveillance, but has not thereby escaped all biometric observation. This is the insight that makes forensic movement analysis practically relevant rather than merely theoretically interesting.

The 2 methods are not in competition. Cases where both facial and kinematic evidence are available allow the information fusion approach that Ross and Jain documented as consistently superior to any single modality. Cases where only kinematics are available are handled by kinematic analysis alone, with conclusions calibrated to what that analysis can support. Cases where both are absent are handled with the honest acknowledgment that the footage does not permit identification.

Ethics, Data Protection, and the Surveillance Expansion

The analytical question and the ethical question are not the same question, and conflating them serves no one well.

The fact that forensic gait analysis is useful does not settle the question of how much surveillance camera coverage is appropriate in a democratic society. The fact that existing footage can be forensically analyzed does not justify collecting more footage than a reasonable privacy framework permits. These are political and legal questions that belong to the public and to the legislature, not to the forensic expert. My role is to analyze what exists, not to advocate for the conditions under which it was collected.

That said, several of the privacy concerns that are commonly raised in public discussions of biometric surveillance do not apply to the specific forensic context in the way people sometimes assume. The concern about mass surveillance using gait recognition is primarily a concern about automated, real-time identification systems that process footage of everyone passing a camera, flagging individuals without a specific investigation. That is a legitimate and serious concern. It is a different situation from the retrospective forensic analysis of archived footage in the context of a specific criminal investigation, where the footage already exists, where the processing is targeted at a named suspect or a described person of interest, and where the conclusion is subject to the adversarial scrutiny of a criminal proceeding.

The ethical obligations of the forensic practitioner in this context are clear: analyze honestly, report uncertainty accurately, do not exceed what the evidence supports, and maintain professional independence from the investigative and prosecutorial interests of the commissioning party. The expert opinion that confirms what investigators already believe, without independent evaluation, is not a forensic opinion. It is a press release in a different format. That distinction, between genuine independent expert analysis and confirmatory expert performance, is one that courts in my experience are becoming increasingly capable of identifying.

Data protection law in Germany, and across the European Union through the GDPR framework, imposes clear obligations on the handling of biometric data. Surveillance footage that allows person identification is personal data within the meaning of the GDPR, and its processing, including forensic analysis, is subject to the relevant legal basis requirements and data minimization principles. My practice complies with these requirements, including operating under the relevant contractual bases for data processing in criminal investigation contexts, maintaining appropriate security standards for the material entrusted to me, and returning or destroying footage in accordance with the applicable retention rules once the analysis is complete.

The expansion of surveillance infrastructure in German cities is a fact that will not be reversed by forensic practitioners declining to analyze the footage produced by that infrastructure. It will be shaped by democratic deliberation about what public safety warrants and what privacy requires. I contribute to that deliberation, when asked, in the appropriate public contexts. In the analysis room, I analyze the footage that has been lawfully collected and lawfully commissioned for examination. That is the scope of my professional role.

References

Aggarwal, J. K., & Cai, Q. (1999). Human motion analysis: A review. Computer Vision and Image Understanding, 73(3), 428-440. https://doi.org/10.1006/cviu.1998.0744
Bouchrika, I., Goffredo, M., Carter, J., & Nixon, M. S. (2011). On using gait in forensic biometrics. Journal of Forensic Sciences, 56(4), 882-889. https://doi.org/10.1111/j.1556-4029.2011.01793.x
Han, J., & Bhanu, B. (2006). Individual recognition using gait energy image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(2), 316-322. https://doi.org/10.1109/TPAMI.2006.38
Komar, D. A., & Buikstra, J. E. (2008). Forensic Anthropology: Contemporary Theory and Practice. Oxford University Press.
Moeslund, T. B., Hilton, A., & Krüger, V. (2006). A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding, 104(2-3), 90-126. https://doi.org/10.1016/j.cviu.2006.08.002
Ross, A., & Jain, A. K. (2003). Information fusion in biometrics. Pattern Recognition Letters, 24(13), 2115-2125. https://doi.org/10.1016/S0167-8655(03)00079-5
Sarkar, S., Phillips, P. J., Liu, Z., Vega, I. R., Grother, P., & Bowyer, K. W. (2005). The HumanID Gait Challenge Problem: Data sets, performance, and analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(2), 162-177. https://doi.org/10.1109/TPAMI.2005.39
Whittle, M. W. (1996). Gait Analysis: An Introduction. Butterworth-Heinemann.