Lecture 5.1 - Training a Body Model and Fitting SMPL to Scans

Lecture Slides: Body Models: Vertex-Based Models and SMPL

Introduction

This lecture explores the training of parametric body models and methods for fitting the SMPL (Skinned Multi-Person Linear) model to 3D scans. We will cover:

  1. Triangle-based body models (SCAPE and BlendSCAPE)

  2. Training parametric body models from registrations

  3. Techniques for obtaining registrations by fitting SMPL to 3D scans

  4. Joint optimization of registration and model training (co-registration)

These topics are crucial for understanding how modern human body models are developed and applied to capture and represent human shape and pose in 3D.

Body Models Based on Triangle Deformations

In the previous lecture, we discussed vertex-based deformation models like SMPL. Now, let’s briefly explore an alternative approach: triangle-based deformation models.

SCAPE and BlendSCAPE Models

SCAPE (Shape Completion and Animation of People) and BlendSCAPE are body models based on triangle deformations rather than vertex displacements.

Key characteristics:

  • Apply 3×3 transformation matrices to each triangle of the mesh

  • Operate on edges (vertex differences) rather than on vertices directly

  • Decompose deformations into pose and shape factors

Unlike vertex-based models, triangle-based models don’t explicitly use a skeleton with joint rotations. Instead, they apply transformations to each triangle and then “stitch” the triangles together to maintain mesh coherence.

Triangle Deformation Process

The triangle deformation process works as follows:

  1. For each triangle in the template mesh, identify its vertices \(T_0, T_1, T_2\)

  2. Center the triangle at the origin and compute its edges:

    \[\begin{split}E_0 = T_2 - T_0 \\ E_1 = T_1 - T_0\end{split}\]
  3. Apply a 3×3 transformation matrix \(X_f\) to each edge:

    \[\begin{split}E_0' = X_f E_0 \\ E_1' = X_f E_1\end{split}\]
  4. After applying these independent transformations to each triangle, the resulting mesh is disconnected.

  5. To create a coherent mesh, solve a least-squares problem to find the vertices that best match the transformed edges:

    \[\mathbf{M} = \arg\min_{\mathbf{M}} \sum_{f=0}^{M-1} \sum_{i \in \{0,1\}} \|e_f^i - \tilde{e}_f^i\|^2\]

    where \(\tilde{\mathbf{E}} \equiv A\mathbf{M} = \{\tilde{e}\}\) and \(A\) is a matrix that extracts edges from vertices.

In SCAPE/BlendSCAPE, the transformation matrix \(X_f\) is factorized into:

\[X_f = B(\vec{\theta})S(\vec{\beta})Q(\vec{\theta})\]

where: - \(Q(\vec{\theta})\) are pose-dependent deformations - \(S(\vec{\beta})\) are shape-dependent deformations - \(B(\vec{\theta})\) handles articulation

This is fundamentally a multiplicative model, in contrast to the additive model used by SMPL.

Comparison: SMPL vs. SCAPE/BlendSCAPE

Here are the key differences between these two approaches:

Feature

SMPL

SCAPE/BlendSCAPE

Deformation type

Vertex-based

Triangle-based

Transformation applied

To vertices

To edges

Transformation matrix

4×4 (homogeneous coords)

3×3

Joint system

Explicit skeleton

No explicit skeleton

Model structure

Additive

Multiplicative

SMPL exhibits superior performance in terms of explained variance with fewer parameters. With just 10 shape components, SMPL captures over 90% of shape variance, while SCAPE requires approximately 300 components to achieve similar performance. This is because SMPL performs PCA directly on vertex displacements, whereas SCAPE performs PCA on 3×3 transformation matrices, which consumes more capacity on transformations that may not significantly affect the final mesh appearance.

Training a Body Model from Registrations

Once we select a body model architecture (such as SMPL), we need to train it using data. The training process aims to learn the model’s parameters from a dataset of 3D human body scans.

The Challenge of Raw Scan Data

Raw 3D scans of humans present several challenges:

  • Missing data due to occlusion or scanner limitations

  • Self-contacts creating ambiguity in the geometry

  • Varying topology across different scans

  • Different resolutions and meshing artifacts

Therefore, we typically don’t train models directly from raw scans but instead work with registrations.

Training from Registrations

A registration is a template mesh that has been non-rigidly deformed to match a 3D scan. All registrations share the same topology (vertex connectivity), making them easier to work with.

Assuming we have a set of registrations \(\mathbf{V}_j\), the training objective can be formulated as:

\[\mathbf{\Phi} = \arg\min_{\mathbf{\Phi}} \sum_j \min_{\vec{\theta}_j, \vec{\beta}_j} \|M(\vec{\theta}_j, \vec{\beta}_j; \mathbf{\Phi}) - \mathbf{V}_j\|^2\]

where: - \(\mathbf{\Phi}\) represents the model parameters to be learned (blend weights, joint regressor, shape components, etc.) - \(\vec{\theta}_j\) and \(\vec{\beta}_j\) are the pose and shape parameters for registration \(j\) - \(M(\vec{\theta}_j, \vec{\beta}_j; \mathbf{\Phi})\) is the output of the body model - \(\mathbf{V}_j\) is the registration for scan \(j\)

This objective minimizes the distance between the model’s prediction and the registration, optimizing both the model parameters \(\mathbf{\Phi}\) and the per-registration pose and shape parameters \(\vec{\theta}_j\) and \(\vec{\beta}_j\).

Obtaining Registrations: Fitting SMPL to Scans

The previous section assumed we already had registrations, but how do we obtain them in the first place? This leads to a “chicken and egg” problem:

  • To train a model, we need registrations

  • To obtain good registrations, we need a model

Non-Rigid Registration Process

The registration process aims to find a mesh \(\mathcal{V}_j\) that best matches scan \(\mathcal{S}_j\):

\[\mathbf{V}_j = \arg\min_{\mathbf{V}_j} \left(\min_{\vec{\theta}_j, \vec{\beta}_j} (E_{reg}(\mathcal{S}_j, \mathbf{V}_j, \vec{\theta}_j, \vec{\beta}_j))\right)\]

where \(E_{reg}\) is the registration energy (objective function).

Iterative Closest Point (ICP) Review

At the core of registration is the Iterative Closest Point (ICP) algorithm, which consists of these steps:

  1. Initialize: Start with an initial estimate of the transformation

  2. Find Correspondences: For each point in the source, find the closest point in the target

  3. Optimize Transformation: Compute the optimal transformation (e.g., with Procrustes analysis)

  4. Iterate: Repeat steps 2-3 until convergence

While traditional ICP handles rigid transformations (rotation and translation), we need non-rigid transformations for body registration. This leads to a question: what transformation function \(f(\cdot)\) should we use?

  • Should \(f(\cdot)\) be all the degrees of freedom in vertices \(\mathbf{V}\)?

  • Should \(f(\cdot)\) be the SMPL function parameterized by \(\vec{\theta}, \vec{\beta}\)?

Registration Objective Formulation

For fitting a body model to scans, we use a comprehensive objective function:

\[E_{reg}(\mathcal{S}_j, \mathbf{V}_j, \vec{\theta}_j, \vec{\beta}_j) = E_S(\mathcal{S}_j, \mathbf{V}_j) + \lambda_C E_C(\mathbf{V}_j, \vec{\theta}_j, \vec{\beta}_j) + \lambda_\theta E_\theta(\vec{\theta}_j) + \lambda_\beta E_\beta(\vec{\beta}_j)\]

where:

  1. Scan-to-mesh distance (\(E_S\)): Minimizes the distance between scan points and the registration mesh:

    \[E_S(\mathcal{S}_j, \mathbf{V}_j) = \sum_{\mathbf{s} \in \mathcal{S}_j} \rho\left(\min_{\mathbf{v} \in \mathcal{V}_j} \|\mathbf{s} - \mathbf{v}\|\right)\]

    with \(\rho(x) = \frac{x^2}{\sigma^2 + x^2}\) being a robust Geman-McClure loss function to handle outliers.

  2. Coupling term (\(E_C\)): Encourages the registration to stay close to the model space:

    \[\begin{split}E_C(\mathbf{V}_j, \vec{\theta}_j, \vec{\beta}_j) = \begin{cases} \|\mathbf{V}_j - M(\vec{\theta}_j, \vec{\beta}_j)\|_F^2, \text{ if coupling on vertices} \\ \|A\mathbf{V}_j - AM(\vec{\theta}_j, \vec{\beta}_j)\|_F^2, \text{ if coupling on edges} \end{cases}\end{split}\]
  3. Pose prior (\(E_\theta\)): Keeps poses within the space of plausible human poses:

    \[E_\theta(\vec{\theta}_j) = (\vec{\theta}_j - \vec{\mu}_\theta)\Sigma_\theta^{-1}(\vec{\theta}_j - \vec{\mu}_\theta)^T\]
  4. Shape prior (\(E_\beta\)): Ensures shapes remain plausible:

    \[E_\beta(\vec{\beta}_j) = (\vec{\beta}_j - \vec{\mu}_\beta)\Sigma_\beta^{-1}(\vec{\beta}_j - \vec{\mu}_\beta)^T\]

The weights \(\lambda_C\), \(\lambda_\theta\), and \(\lambda_\beta\) balance these different terms.

Point-to-Surface Distance

A key detail in the scan-to-mesh distance term is that we use a point-to-surface distance rather than a point-to-point distance. This means we find the closest point on the mesh surface for each scan point, not just the closest vertex.

This approach has several advantages: - Allows the correspondence to slide along the surface during optimization - Avoids local minima that point-to-point matching might introduce - Produces smoother, more accurate registrations

Multi-Stage Optimization Strategy

Registration quality can be improved using a multi-stage optimization strategy with varying weights:

Stage 1: Strong priors, very strong coupling - \(\lambda_\theta \gg 1\) - \(\lambda_\beta \gg 1\) - \(\lambda_C = \infty\)

This stage essentially performs SMPL fitting to get close to the scan.

Stage 2: Relaxed priors, strong coupling - \(\lambda_\theta \approx 1\) - \(\lambda_\beta \approx 1\) - \(\lambda_C \gg 1\)

This allows more freedom while still keeping close to the model space.

Stage 3: Minimal priors, weak coupling - \(\lambda_\theta \approx 0\) - \(\lambda_\beta \approx 0\) - \(\lambda_C \approx 1\)

This stage allows the registration to capture details beyond what the model can represent.

Joint Registration and Model Training

To solve the “chicken and egg” problem mentioned earlier, we can jointly optimize for registrations and model parameters using co-registration.

Co-Registration Approach

The key idea is to minimize the registration objective across a dataset of scans:

\[E(\{\mathcal{S}_j\}, \{\mathbf{V}_j\}, \{\vec{\theta}_j\}, \mathbf{\Phi}) = \sum_j (E_S(\mathcal{S}_j, \mathbf{V}_j) + \lambda_C E_C(\mathbf{V}_j, \vec{\theta}_j, \mathbf{\Phi}) + \lambda_\theta E_\theta(\vec{\theta}_j)) + \lambda_\Phi E_\Phi(\mathbf{\Phi})\]

where \(E_\Phi(\mathbf{\Phi})\) is a regularization term for the model parameters.

This complex optimization can be solved using block coordinate descent or expectation-maximization:

  1. Fix registrations \(\{\mathbf{V}_j\}\) and optimize model parameters:

    \[\mathbf{\Phi} = \arg\min_{\mathbf{\Phi}} (E(\{\mathcal{S}_j\}, \{\mathbf{V}_j\}, \{\vec{\theta}_j, \vec{\beta}\}, \mathbf{\Phi})) \text{ with } \{\mathbf{V}_j\} \text{ fixed}\]
  2. Fix model parameters \(\mathbf{\Phi}\) and optimize registrations:

    \[\{\mathbf{V}_j\} = \arg\min_{\{\mathbf{V}_j\}} (E(\{\mathcal{S}_j\}, \{\mathbf{V}_j\}, \{\vec{\theta}_j, \vec{\beta}_j\}, \mathbf{\Phi})) \text{ with } \mathbf{\Phi} \text{ fixed}\]
  3. Iterate until convergence.

This approach allows us to simultaneously solve for the model parameters and the registrations, breaking the chicken-and-egg cycle.

Summary

This lecture covered several important aspects of body model training and fitting:

  1. Triangle-based body models like SCAPE and BlendSCAPE apply 3×3 transformations to triangle edges and require a stitching step to maintain mesh coherence.

  2. Training a body model requires solving for model hyperparameters and per-registration pose and shape parameters, which can be formulated as a minimization problem.

  3. Obtaining registrations involves fitting a model to scans using non-rigid ICP with a carefully designed objective function that balances data fidelity, model coupling, and prior constraints.

  4. The chicken-and-egg problem of needing registrations to train a model and a model to create registrations can be addressed through co-registration, which jointly optimizes for both.

These techniques form the foundation for creating accurate, realistic 3D body models that can represent a wide range of human shapes and poses.