Lecture 01.1 – Historical Body Models

Lecture Slides: Introduction to Body Models and History

Introduction

Modeling the 3D human body has been a long‐standing challenge in computer graphics, computer vision, and artificial intelligence. Over the decades, approaches have evolved from simplistic representations built from basic geometric primitives to sophisticated, data‐driven, and deep learning–based neural models. This document presents a detailed historical overview that combines beginner-friendly explanations with technical depth, highlighting the evolution of human body models—from early kinematic skeletons and generative models to modern statistical and neural representations. Applications (e.g., in animation, pose estimation, AR/VR, robotics) are mentioned only to contextualize key advancements.

Early Origins: Simplified Primitives and Kinematic Skeletons (1970s–1980s)

At the very beginning, researchers sought to represent the human form using simple, abstract components. Early models utilized:

Geometric Primitives: Simple shapes such as cylinders, spheres, and cones were used as building blocks to approximate human limbs and the torso. Pioneering work by Nevatia and Binford, as well as Marr and Nishihara, proposed the idea of representing complex shapes by assembling simpler volumetric primitives. These methods laid the foundation for representing limbs as tapered cylinders.
Kinematic Skeletons: The human body was conceptualized as a collection of rigid segments (bones) connected by joints forming a kinematic tree. This approach allowed researchers to model human motion by defining how parts rotate relative to each other. A classic example is Johansson’s 1973 “point‐light” experiment, which demonstrated that 10–12 moving dots (attached to key joints) are sufficient for humans to perceive the motion of a living body. This finding reinforced the idea that even minimal representations (stick figures) could evoke a vivid perception of human motion.
Model-Based Tracking: In 1983, David Hogg’s work “Model-based vision: A program to see a walking person” was groundbreaking. Hogg built a 3D human model using connected cylinders and used edge matching between the projected model and the image to track a walking person. Despite the limited fidelity, this generative approach established the principle that an explicit human body model could be fitted to visual data.

# Key Concepts in Early Models
- Generalized Cylinders: Abstract shapes to model limbs.
- Kinematic Tree: Hierarchical structure of joints and bones.
- Point-Light Display: Minimal cues (10–12 points) sufficient for human motion perception.
- Edge-Based Fitting: Matching model projections to image contours.

Advances in the 1990s: Superquadrics, Differentiable Fitting, and Physical Models

The early 1990s saw significant advances:

Superquadric Models: Instead of simple cylinders, researchers began using superquadrics—a family of shapes that can smoothly vary from cuboid to spherical forms—to approximate body parts. This allowed for better geometric fidelity when modeling limbs and other body segments.
Differentiable Fitting and Optimization: Bregler and Malik (1998) introduced a framework that used twist and exponential maps for representing rotations and applied gradient-based optimization to minimize the difference between the projected model and image observations. This method made the fitting process more efficient compared to exhaustive or random search strategies.
Physical and Biomechanical Models: To simulate more realistic deformations, some researchers explored physics-based models. These models incorporated muscle and skin dynamics to mimic how real tissue deforms during motion. While computationally expensive, such methods began to capture the nuances of human movement better than rigid models.

# Innovations in the 1990s
- Superquadrics: Enhanced geometric representation.
- Gradient-Based Fitting: Use of differentiable cost functions.
- Physics-Based Deformation: Early attempts to simulate muscle and skin dynamics.

The Impact of 3D Scanning and Data: From Anthropometry to Statistical Models (1990s–2000s)

A major turning point in human body modeling was the advent of 3D scanning technology:

3D Scanning and the CAESAR Dataset: With laser and structured-light scanners becoming available in the late 1990s, large-scale 3D body scanning projects (such as the CAESAR project) provided thousands of detailed body scans. These datasets captured real human variability and enabled the creation of statistical models that could represent the diversity of human shapes.
Statistical Body Shape Models: Inspired by the success of Blanz and Vetter’s 3D Morphable Model for faces (1999), researchers applied similar techniques to full-body scans. Allen, Curless, and Popović (2003) pioneered a statistical body model using PCA by aligning a common template to many 3D scans. This model encoded body shape variations as a low-dimensional vector, allowing for the interpolation of new, realistic body shapes.
Limitations and Extensions: Early statistical models were “shape-only” and did not capture pose-dependent deformations. When these models were animated via a skeleton, they often appeared unrealistic because they lacked the natural bulging or compressions that occur when muscles contract or relax.

# Milestones in Data-Driven Body Modeling
- CAESAR Dataset (1999): 3D scans of ~4,000 individuals.
- PCA-Based Models (2003): Statistical representation of body shape variations.
- Pose-Independent Limitations: Initial models did not account for pose-induced deformations.

SCAPE and the Emergence of Pose-Aware Models (Mid-2000s)

A breakthrough in the field came with the introduction of the SCAPE model (Shape Completion and Animation of People) in 2005:

SCAPE Model: SCAPE extended statistical body modeling by incorporating a separate model for pose-dependent deformations. It learned a statistical shape space from multiple subjects (capturing identity) and separately learned how a single subject’s body deforms with changes in pose (capturing dynamic deformations). This allowed SCAPE to produce more realistic animations by adapting the body’s surface according to the joint angles.
Technical Contributions: SCAPE represents deformations on a per-triangle basis, effectively decoupling the identity (shape) from the pose. It uses both a linear model (for overall shape) and non-linear corrections (for pose-dependent deformations), marking a significant improvement in realism.

# Key Attributes of the SCAPE Model
- Dual Space Representation: Separate spaces for identity and pose.
- Per-Triangle Deformations: Allows realistic muscle and skin behavior.
- Improved Animation Quality: Natural bulging and compression during motion.

Consolidation in the 2010s: SMPL and Integration with Learning-Based Methods

The early 2010s saw a transition from laboratory prototypes to widely accessible, general-purpose models:

SMPL (Skinned Multi-Person Linear Model): Introduced in 2015, SMPL built on ideas from SCAPE and earlier PCA-based models. It represents the human body as a triangulated mesh controlled by a low-dimensional set of shape and pose parameters. SMPL uses linear blend skinning (LBS) with learned corrective blendshapes to overcome common artifacts in skinning. Its public release and ease of integration made it the standard for research and industry.
Extensions of SMPL: Models such as SMPL-X expanded the parameter space to include hands and facial expressions, while dynamic models (e.g., Dyna) incorporated temporal dynamics to simulate soft-tissue movement.
Fitting Advances: Early model fitting used optimization methods (e.g., SMPLify) based on minimizing reprojection error between 2D keypoints and the model. Later, differentiable rendering frameworks (e.g., OpenDR, Neural Renderer) enabled end-to-end learning, allowing convolutional neural networks to directly regress model parameters from images.

# SMPL and Its Impact
- SMPL (2015): The de facto standard statistical body model.
- Public Availability: Enabled widespread adoption in academia and industry.
- Integration with Deep Learning: Paved the way for methods such as HMR (end-to-end prediction of SMPL parameters).

Deep Learning and Neural Implicit Models (Late 2010s–Present)

The most recent developments have leveraged deep learning to move beyond explicit mesh representations:

Neural Parametric Models: Some recent approaches replace linear PCA with non-linear latent spaces learned by neural networks. These models can capture finer details and more complex variations in body shape and clothing.
Neural Implicit Surface Models: Neural implicit representations, such as occupancy networks or signed distance functions (SDFs), have emerged as powerful alternatives to explicit mesh models. For example, PIFu (Pixel-Aligned Implicit Function) reconstructs high-resolution clothed human bodies from a single image by learning an implicit surface function. These models overcome limitations of fixed mesh connectivity and can capture fine geometric details.
Neural Avatars and Radiance Fields: Cutting-edge approaches combine neural implicit representations with differentiable rendering to generate photorealistic images of virtual humans. These methods, often based on Neural Radiance Fields (NeRF), allow for free-viewpoint synthesis and dynamic rendering of human subjects.

# Modern Trends
- Neural Implicit Representations: Continuous models (e.g., PIFu, NASA) that model occupancy or distance.
- Integration with Differentiable Rendering: End-to-end learning from RGB images.
- Neural Avatars: Photo-realistic rendering of human subjects in novel poses or views.

Timeline Summary of Milestones

Below is a concise timeline summarizing key milestones:

1970s:
- Generalized cylinders and kinematic skeletons.
- Johansson’s point-light experiments demonstrate minimal cues for human motion perception.
Early 1980s:
- Hogg (1983) introduces model-based human tracking using a cylinder-based 3D model.
- Establishment of the kinematic tree paradigm.
Early 1990s:
- Use of superquadrics to better approximate human limbs.
- Introduction of differentiable, gradient-based fitting (Bregler & Malik, 1998).
- Early physics-based and biomechanical models.
Late 1990s:
- Release of the CAESAR dataset enabling statistical body shape modeling.
- Blanz & Vetter’s morphable model for faces inspires similar methods for full bodies.
Early 2000s:
- Allen, Curless, and Popović (2003) develop the first PCA-based statistical body model.
- Initial “shape-only” models emerge.
Mid-2000s:
- SCAPE (2005) introduces pose-dependent deformations.
- Multilinear (tensor) models of pose and shape further refine the approach.
Early 2010s:
- Custom subject-specific body models remain common in research.
2015 and Beyond:
- SMPL (2015) becomes the standard parametric body model.
- SMPL extensions (SMPL-X, Dyna) add expressiveness.
- End-to-end deep learning approaches (SMPLify, HMR) integrate differentiable rendering.
- Emergence of neural implicit models (PIFu, NASA) and neural avatars for photorealistic rendering.

Conclusion

The evolution of body models for virtual humans reflects an ongoing quest to create digital representations that are both realistic and computationally efficient. Early work laid the foundations with simple geometric primitives and rigid kinematic skeletons. As 3D scanning technology emerged, statistical models built on real data revolutionized the field, culminating in pose-aware models such as SCAPE. The introduction of SMPL in 2015 marked the consolidation of these ideas into a widely accessible and practical model, which has since been integrated with deep learning and differentiable rendering techniques. Today’s models—including neural implicit representations and neural avatars—continue to push the boundaries of realism, offering unprecedented detail and fidelity in digital human representation.