Lecture 08.1: Vertex-Based Clothing Modeling for Virtual Humans

Lecture Slides: Vertex-Based Clothing Modeling for Virtual Humans

Introduction

Modeling clothing on virtual humans is a longstanding challenge in computer vision and graphics. Unlike minimally-clothed body models (like SMPL or SCAPE) that represent the human shape with a single mesh, clothed humans exhibit complex geometry with folds, wrinkles, and layering that must deform naturally with body pose. In this lecture, we explore vertex-based clothing representations, which model garments explicitly as meshes, often implemented as vertex displacements added to a base human template.

This explicit surface representation preserves correspondences, allows direct manipulation of vertices, and integrates well with standard animation pipelines. We’ll examine key techniques for registering clothing to 3D scans, estimating body shape under clothing, and performing multi-layer tracking of independent garments. We’ll focus on seminal works like ClothCap and BUFF (Bodies Under Flowing Fashion) that laid the groundwork for data-driven clothing capture.

Clothing Representation as Vertex Displacements

The SMPL model represents human shape and pose using blend shapes - a set of vector displacements added to a base template in T-pose. To model clothing with SMPL, we can extend this paradigm by adding another layer of vertex displacements to represent the clothing geometry.

Formally, given a template body mesh \(T_0\) with shape parameters \(\beta\) and pose parameters \(\theta\), we can define a clothed body mesh as:

\[T(\beta, \theta, D) = T_0 + B_s(\beta) + B_p(\theta) + D\]

where: - \(T_0\) is the base template mesh - \(B_s(\beta)\) represents shape-dependent blend shapes - \(B_p(\theta)\) represents pose-dependent blend shapes - \(D\) is the clothing displacement vector

This representation, sometimes called “SMPL+D”, allows us to “dress” the template by pushing vertices outward to match clothing geometry. The method is straightforward and maintains the same mesh topology and vertex count as the original body model.

Advantages of this representation include: - Maintaining consistent correspondences across different poses - Compatibility with the underlying skeletal rig - Simplicity of implementation

However, this representation has a key limitation: it collapses body and clothing into a single surface, ignoring the layered nature of real clothing. This becomes problematic when clothing separates from the body (e.g., a loose jacket swinging), as the single-layer model cannot properly represent that separation.

Registration of Clothed Human Scans

The registration process for clothed humans follows a similar approach to registering minimally-clothed subjects, but with additional challenges due to clothing geometry. The general procedure is:

  1. Begin by fitting the parametric body model (e.g., SMPL) to the scan to capture the overall pose and approximate shape

  2. Once the model is close enough to the data, optimize free-form vertex displacements (\(D\)) to match the scan’s surface

  3. Alternate between refining model parameters and vertex displacements until convergence

This process results in a registered mesh with clothing that maintains the vertex ordering of the template but approximates the clothed scan’s geometry. The registration essentially provides a 3D alignment or correspondence between the scan and the template.

The objective function for this registration typically includes:

\[E(\beta, \theta, D) = \sum_{j=1}^{M} \|v_j(\beta, \theta, D) - x_j\|^2 + \lambda\|D\|^2\]

where \(v_j\) are model vertices, \(x_j\) are the nearest points on the scan, and \(\lambda\|D\|^2\) regularizes the displacements to keep them small, preferring to explain as much as possible with the base model.

While this single-layer approach is effective for capturing the outer surface geometry, it treats clothing as an “outer skin” of the body rather than as separate garment layers. This limitation motivated the development of multi-layer models like BUFF and ClothCap.

Estimating Body Shape Under Clothing: The BUFF Method

The BUFF (Bodies Under Flowing Fashion) method tackles the challenge of estimating a person’s body shape underneath their clothing from 3D scans. The key insight is that as a person moves, different parts of the body become tightly pressed against clothing at different moments, revealing information about the true body shape underneath.

BUFF solves this through a multi-frame approach:

  1. Per-Frame Registration: Register a template mesh to each scan frame using SMPL+D (as described above)

  2. Unposing to Canonical Pose: Transform all registered meshes to a common canonical pose (e.g., T-pose)

  3. Fusion of Observations: Compute a “fusion scan” as the union of all unposed meshes

  4. Shape Optimization: Solve for a naked body mesh that: - Fits inside the fusion scan (no vertex should protrude outside) - Tightly fits any points identified as skin in the scans - Stays close to a plausible human shape

The core objective function in BUFF includes:

\[E(B) = E_{\text{skin}}(B) + E_{\text{cloth}}(B) + E_{\text{coupling}}(B) + E_{\text{pose-prior}}\]

where: - \(E_{\text{skin}}\) penalizes distance from the body to scan points identified as skin - \(E_{\text{cloth}}\) ensures the body stays inside clothing and close to tight clothing - \(E_{\text{coupling}}\) keeps the solution within a plausible human shape space - \(E_{\text{pose-prior}}\) ensures reasonable joint angles

The cloth term \(E_{\text{cloth}}\) has two components: an “outside” term that heavily penalizes any case where the body pokes through clothing, and a “fit” term that encourages the body to be as large as possible within the clothing boundary.

This approach effectively “carves out” the space the body can occupy by accumulating constraints from multiple poses, resulting in an accurate estimation of body shape even when it’s never fully visible in any single frame.

Multi-Layer Clothing Capture: The ClothCap Method

ClothCap extends the vertex-based approach to capture separate layers for the body and individual garments. This multi-layer representation addresses the limitations of single-surface models, allowing clothing to properly separate from the body and enabling applications like virtual try-on.

The ClothCap pipeline consists of several key stages:

  1. Body Shape Estimation: First, estimate the subject’s naked body shape using BUFF or a similar method

  2. Clothing Template Creation: Create separate templates for each garment type by: - Segmenting the SMPL template into regions corresponding to different garments - Cutting the template along segment boundaries to create separate sub-meshes - Deforming each template part to fit the corresponding region in the scan

  3. Per-Frame Multi-Part Registration: For each frame, optimize for: - Skeleton pose parameters that move all parts in a consistent way - Per-vertex deformations of each garment to capture cloth wrinkles

The objective function for ClothCap’s registration is:

\[E(\theta, \{V^{(g)}\}) = \sum_{g=1}^{G} E_{\text{data}}^{(g)}(V^{(g)}, S^{(g)}) + E_{\text{cpl}}(\theta, \{V^{(g)}\}) + E_{\text{bnd}}(\{V^{(g)}\}, S) + E_{\text{lap}}(\{V^{(g)}\})\]

where: - \(E_{\text{data}}^{(g)}\) measures the distance between each garment mesh and its corresponding scan segment - \(E_{\text{cpl}}\) keeps the deforming garment meshes close to the underlying kinematic model - \(E_{\text{bnd}}\) ensures garment edges (hems, cuffs) align with scan boundaries - \(E_{\text{lap}}\) is a Laplacian smoothness term for the mesh geometry

Let’s examine these terms in more detail:

Data Term

The data term minimizes the scan-to-mesh distance for each garment separately:

\[E_{\text{data}}^{(g)} = \sum_{x \in S^{(g)}} \mathrm{dist}(x, V^{(g)})^2\]

where \(S^{(g)}\) is the segment of the scan corresponding to garment \(g\), and \(V^{(g)}\) is the garment mesh. By processing each garment separately, the registration is more accurate than trying to match the whole scan as one surface.

Boundary Term

The boundary term ensures that garment edges align with scan boundaries:

\[E_{\text{bnd}} = \sum_{\text{rings } r} \sum_{y \in \text{scanBoundary}_r} \mathrm{dist}(y, \text{modelRing}_r)^2\]

This term helps track the motion of garment boundaries (e.g., sleeve cuffs or shirt hem) as they shift on the body during motion.

Boundary Smoothness

To prevent the garment boundaries from becoming noisy or spiky, ClothCap enforces boundary smoothness by minimizing curvature along the boundary curves:

\[E_{\text{curvature}} = \sum_{r=1}^{R_L} \sum_{n \in \text{ring}_r} \|V_{n-1} - 2V_n + V_{n+1}\|^2\]

This term calculates the second derivative along the boundary loop using a simple finite difference approximation (1, -2, 1), penalizing high curvature.

Laplacian Smoothness

For overall mesh smoothness, ClothCap uses a Laplacian term:

\[E_{\text{lap}} = \sum_{g=1}^{G} \|L^{(g)} V^{(g)}\|_F^2\]

where \(L^{(g)}\) is the graph Laplacian matrix for garment \(g\), and \(\|\cdot\|_F\) is the Frobenius norm.

The graph Laplacian is defined as:

\[L = I - H^{-1}Z\]

where: - \(I\) is the identity matrix - \(H\) is a diagonal matrix where \(H_{ii}\) equals the number of neighbors of vertex \(i\) - \(Z\) is the adjacency matrix (\(Z_{ij} = 1\) if vertices \(i\) and \(j\) are connected)

For each vertex \(v_i\), the Laplacian computes the difference between the vertex position and the average of its neighbors:

\[(Lv)_i = v_i - \frac{1}{|N(i)|} \sum_{j \in N(i)} v_j\]

This operation approximates the mean curvature normal, making the surface as smooth as possible when minimized. The data terms prevent the surface from becoming completely flat, allowing it to capture important garment details like wrinkles.

Applications of Multi-Layer Registration

Once the multi-layer registration is complete, the separate garment meshes can be used for various applications:

Clothing Transfer: Garments can be transferred from one person to another by: 1. Computing garment displacements as the difference between the garment and body:

\[D_G = G - I_G \cdot T\]

where \(G\) is the garment mesh, \(T\) is the body template, and \(I_G\) is an indicator matrix specifying which body vertices correspond to the garment.

  1. Adding these displacements to a new body shape:

    \[G_{\text{new}} = T_{\text{new}} + D_G\]

This enables “virtual try-on” where a person can be visualized wearing garments captured from someone else.

Animation and Retargeting: Since the multi-layer model preserves the kinematic structure of SMPL, captured garments can be animated with the body using standard skinning techniques. When the body moves, the garments move accordingly, with wrinkles and deformations consistent with the original capture.

Texture Mapping: Having separate meshes for each garment allows for proper texture mapping, which would be problematic with a single-surface representation, especially when garments slide against the body.

Advantages and Limitations

The multi-layer approach offers several advantages: - Models the true physical layering of clothing - Enables clothing transfer and retargeting - Preserves garment boundaries and interior structure - Allows for more realistic animation

However, there are limitations: - Requires segmentation of scans, which can be challenging - The method assumes clothing categories are known in advance - More complex than single-layer models in terms of implementation - May struggle with very loose or flowing garments (e.g., dresses)

Conclusion

Vertex-based clothing modeling provides a powerful framework for representing clothed virtual humans. The single-layer SMPL+D approach offers simplicity but ignores clothing layers, while multi-layer methods like BUFF and ClothCap address this limitation by modeling the body and garments as separate meshes.

These techniques have enabled significant advances in human digitization, allowing for accurate capture of clothed people and manipulation of the resulting models. Later lectures will explore how these registration techniques serve as the foundation for learning-based models of clothing, as well as alternative representations like implicit surfaces and point-based models.

The challenges of vertex-based methods - particularly handling loose clothing and topology changes - have motivated research into complementary approaches, which we’ll examine in subsequent lectures.