STUDENT TUTORIAL

Animatable Avatar Pipeline

Transform multi-view photographs into animated 3D human avatars with motion transfer.

The Core Problem

Given N images of a person, recover the 3D body model parameters Θ = (β, θ, τ) that explain the observations.

COLMAP Preprocess SMPL-X Fit Texture Motion Retarget VAT Viewer
10,475
vertices
60
fps playback
83×
faster texturing
1 / 14

Pipeline Overview

Each stage transforms data representation, progressively constraining the solution space.

1

COLMAP

Multi-view stereo recovers 3D geometry with vertex colors

N × H×W×3 images ~1.5M vertices + colors
2

Preprocess

Remove artifacts, fix topology, normalize scale

1.5M noisy vertices 300-500K clean vertices
3

SMPL-X Fit

Constrain to anatomically plausible body; gain skeleton

300-500K target points 10,475 rigged vertices
4

Color Transfer

k-NN maps scan colors to SMPL-X vertices once; persistent

COLMAP colors + SMPL-X 10,475 vertex colors
5

Motion Retarget

Transfer any motion to our body shape; colors follow via LBS

SMPL-X + MOYO sequence T frames × (xyz + rgb)
6

Retarget to COLMAP (optional)

Transfer SMPL-X skinning weights to original mesh; animate full-detail geometry

SMPL-X weights + COLMAP ~1M rigged vertices
7

VAT Export (optional)

GPU-friendly format for real-time playback

T × 10,475 × (xyz+rgb) 3 PNGs + JSON

The Dimensionality Story

We start with millions of pixels, reconstruct ~1.5M colored 3D points, then collapse geometry to 119 parameters: β (10 shape), θ (75 pose), hands (24 PCA), ψ (10 expression). SMPL-X acts as a strong prior. Optional: retarget weights back to COLMAP for full-detail animation.

N×H×W×3 pixels
1.5M pts + rgb
119 params + rgb
1M+ retarget (opt)
2 / 14

Stage 1: Image Capture

Video Capture Workflow (Recommended)

📱
1. Record Video
3 orbits, 3 angles
🎬
2. Extract Frames
FFmpeg (adjust fps)
📁
3. Feed to COLMAP
Ordered sequence

Coverage: 3 Orbits × 3 Camera Angles

  • 1. Crouching ↑ — capture under chin
  • 2. Standing → — eye level
  • 3. Arms up, aim ↓ — top of head

Walk slowly! Adjacent frames need 70% shared content.

100+
frames min
1080p+
resolution
3
angles

What is "overlap"?

Adjacent frames must share ~70% of visible content. This means walking slowly—if you turn too fast, consecutive frames have no common features and COLMAP can't triangulate.

Subject Pose: T-Pose
T-pose reference

Why T-Pose?

  • Color coverage: Exposes underarms, inner arms, sides
  • • Matches SMPL-X default pose
  • • Minimizes self-occlusion
  • • Best for motion retargeting

Avoid These Mistakes

  • ✗ Subject moving during capture
  • ✗ Arms at sides / hands in pockets
  • ✗ Harsh shadows / mixed lighting
  • ✗ Gaps in coverage
3 / 14

Stage 2: COLMAP Reconstruction

1
Features
2
Match
3
Sparse
4
Undistort
5
Stereo
6
Fusion
7
Mesh
# Feature extraction (with optional mask path)
colmap feature_extractor --database_path db.db --image_path ./images \
    --ImageReader.camera_model SIMPLE_RADIAL --ImageReader.single_camera 1 \
    --ImageReader.mask_path ./masks/  # optional

# Matching: try sequential first (video), fallback to exhaustive
colmap sequential_matcher --database_path db.db --SequentialMatching.overlap 10

# Sparse → Dense → Mesh
colmap mapper --database_path db.db --image_path ./images --output_path sparse
colmap image_undistorter --input_path sparse/0 --output_path dense
colmap patch_match_stereo --workspace_path dense
colmap stereo_fusion --workspace_path dense --output_path dense/fused.ply
colmap poisson_mesher --input_path dense/fused.ply --output_path dense/mesh.ply \
    --PoissonMeshing.depth 12
1. Input Video
2. Segmented
Feature matches
3. Feature Matches
4. Point Cloud
5. 3D Mesh

Optional: Provide Masks

Don't remove backgrounds—black/green bleeds into reconstruction! Provide binary masks via --ImageReader.mask_path

Validation Checklist

After features: 2,000-10,000 per image
After matching: 100-1,000+ matches/pair
After sparse: 70-90% images registered
After fusion: millions of points

Common Failures

<500 features: blurry or textureless
<50% registered: insufficient overlap
Sequential fails: try exhaustive_matcher
Poisson Depth → Vertex Count
depth 8 → ~2K verts (preview)
depth 10 → ~42K verts
depth 11 → ~256K verts
depth 12+ → ~1.4M+ verts ⭐
Key Outputs
sparse/0/ ← camera poses
dense/fused.ply ← point cloud
dense/mesh.ply ← final mesh
4 / 14

Stage 3: Mesh Preprocessing

COLMAP mesh
~1-2M verts, Y-down
1. Clean 2. Simplify 3. Transform
Ready for fitting
~50k verts, Y-up, meters
1. Clean
# Remove degenerate geometry
mesh.remove_degenerate_triangles()
mesh.remove_duplicated_vertices()
mesh.remove_non_manifold_edges()

# Keep largest connected component
tri_clusters = mesh.cluster_connected_triangles()
largest = np.argmax(cluster_n)
mesh.remove_triangles_by_mask(mask)

# Poisson → watertight
pcd.estimate_normals()
mesh = o3d.geometry.TriangleMesh\
    .create_from_point_cloud_poisson(pcd, depth=9)
2. Simplify
# Save colors before (may be lost)
old_colors = np.asarray(mesh.vertex_colors)

# Quadric decimation
mesh = mesh.simplify_quadric_decimation(
    target_number_of_triangles=100000)

# Transfer colors via k-NN
tree = cKDTree(old_verts)
_, idx = tree.query(np.asarray(mesh.vertices))
mesh.vertex_colors = old_colors[idx]

# Recompute normals
mesh.compute_vertex_normals()
3. Transform
# Scale to target height (meters)
h = verts[:,1].max() - verts[:,1].min()
verts *= (1.7 / h)

# Flip Y: COLMAP Y-down → SMPL-X Y-up
verts[:,1] *= -1

# Center: pelvis at origin (~0.9m up)
verts[:,1] -= verts[:,1].min() + 0.9
verts[:,0] -= verts[:,0].mean()
verts[:,2] -= verts[:,2].mean()
Coordinate Systems
COLMAP: Y-down ↓
SMPL-X: Y-up ↑
verts[:,1] *= -1

Why Poisson Reconstruction?

COLMAP meshes have holes and non-manifold edges. Poisson creates a watertight surface required for skinning.

Why Simplify to 50k?

SMPL-X fitting computes distances from every vertex. 1M+ vertices → GPU OOM errors.

5 / 14

Stage 4: SMPL-X Model Fitting

Photos COLMAP Clean SMPL-X Fit Texture Motion

What is SMPL-X?

SMPL eXpressive — unified body model combining SMPL body, FLAME face, MANO hands.

Vertices:10,475
Joints:54
Shape (β):10 params
Body pose:63 params
Hands (PCA):24 params
Expression (ψ):10 params
SMPL-X Model

Chamfer Distance Loss

Lchamfer = 1/|A| Σ min‖a-b‖₂ + 1/|B| Σ min‖b-a‖₂
# Pairwise distances
dist = torch.cdist(smplx_v, scan_v)
loss_s2t = dist.min(dim=1)[0].mean()
loss_t2s = dist.min(dim=0)[0].mean()
chamfer = (loss_s2t + loss_t2s) / 2

SMPL-X Forward Pass

import smplx
model = smplx.create(
    model_path='models/',
    model_type='smplx',
    gender='neutral',
    num_betas=10
).to(device)

output = model(
    betas=betas,        # (1,10)
    body_pose=pose,     # (1,63)
    global_orient=orient,# (1,3)
    transl=transl       # (1,3)
)
vertices = output.vertices  # (1,10475,3)

Multi-Stage Optimization

1 Global: orient, transl, scale
2 Shape: β (body proportions)
3 Coarse pose: HIGH regularization
4 Fine pose: lower regularization
5 Joint: all params together

Pose Presets

Preset Shoulder Z Use When
t-pose ✓ Arms horizontal (recommended)
a-pose ±45° Arms diagonal
relaxed ±72° Natural standing
6 / 14

⚠️ Pose Initialization Matters!

Wrong initialization → optimizer must rotate joints through large angles → twisted limbs and incorrect color transfer.

Shoulder Z-rotation: t-pose: 0° a-pose: ±45° relaxed: ±72° ✓ arms-at-side: ±90°
❌ T-Pose Init (scan was relaxed)
✓ Relaxed Init (matches scan)
~ A-Pose Init (27° off)
❌ Arms Above Head (severe)
💡 Best practice: Capture subject in T-pose → default init (zeros) works perfectly.
7 / 14

Stage 5: Persistent Color Mapping

COLMAP Clean SMPL-X Fit Color Map Motion VAT

❌ Naive Approach: Per-Frame k-NN

# For EVERY frame...
for frame in frames:
    posed_smplx = apply_pose(smplx, pose)
    colors = knn(posed_smplx, colmap_mesh)
    # But COLMAP is in original pose!

Problem: COLMAP mesh is static. When SMPL-X pose changes, k-NN finds wrong neighbors from the frozen scan.

🔑 Key Insight: LBS Preserves Vertex Indices

Linear Blend Skinning transforms vertex positions, not identities:

vi(θ) = Σj wij · Gj(θ) · virest
  • ✓ Vertex 5432 is always vertex 5432 (e.g., tip of nose)
  • ✓ Colors assigned to index persist through motion

✅ The Solution: Compute Once, Reuse Forever

# Compute ONCE in fitted pose
colors = knn_interpolate(fitted_smplx, colmap_mesh)
np.savez('vertex_colors.npz', colors=colors)

# For ALL frames - just load!
colors = np.load('vertex_colors.npz')['colors']
mesh.visual.vertex_colors = colors
# Colors "travel" with vertices via LBS

k-NN with Distance Weighting

from scipy.spatial import cKDTree

# Build tree from COLMAP vertices
tree = cKDTree(colmap_verts)

# Query k=8 nearest for each SMPL-X vertex
dists, idxs = tree.query(smplx_verts, k=8)

# Distance-weighted interpolation
weights = 1.0 / (dists + 1e-8)
weights /= weights.sum(axis=1, keepdims=True)

# Weighted average of neighbor colors
colors = np.einsum('nk,nkc->nc',
    weights, colmap_colors[idxs])

Why This Works

The key realization: Color is a property of the vertex, not the position.

Once you assign "red" to vertex 5432 (the nose), it stays red whether the nose is at (0,0,1) or (0.5, 0.2, 1.1).

Why Per-Frame Fails

Per-frame: Arm raises → arm vertices near COLMAP torso → k-NN returns torso colors
Persistent: Colors computed when aligned → locked to vertex index → correct colors travel with arm
8 / 14

Stage 6: Motion Application

SMPL-X Fit Color Map Motion Retarget VAT
Core Insight: Shape β stays constant (your body). Pose θ varies per frame (the motion).

🔄 Motion Transfer Pipeline

1 Load Fitted β + Scale
2 Load Persistent Colors
3 Load Motion Sequence
4 Center Translation ⚠️
5 Generate Per-Frame Mesh
6 Apply Colors & Export
⚠️ Translation: Raw MoCap = absolute coords. Center to avoid teleporting on loop! transl -= transl[0:1]
import torch
import trimesh
import smplx

# 1. Load YOUR fitted parameters
params = torch.load('fitted/smplx_parameters.pt')
betas = params['betas']      # Shape: your proportions
scale = params.get('scale', torch.tensor([1.0]))

# 2. Load persistent vertex colors
fitted = trimesh.load('fitted/fitted_smplx_colored.ply')
colors = fitted.visual.vertex_colors[:, :3]

# 3. Load motion sequence
data = pickle.load(open('motion.pkl', 'rb'))

# 4. CENTER TRANSLATION (critical for loops!)
transl = data['transl'] - data['transl'][0:1]

# 5. Initialize SMPL-X model
body_model = smplx.create(
    model_path='models/smplx',
    model_type='smplx',
    gender='neutral'
)

# 6. Generate each frame
for i in range(num_frames):
    output = body_model(
        betas=betas,                    # YOUR shape
        global_orient=data['global_orient'][i:i+1],
        body_pose=data['body_pose'][i:i+1],
        transl=transl[i:i+1]
    )
    
    verts = output.vertices[0].numpy() * scale.item()
    mesh = trimesh.Trimesh(verts, body_model.faces)
    mesh.visual.vertex_colors = colors
    mesh.export(f'frames/frame_{i:05d}.ply')
9 / 14

Stage 7: VAT Conversion (Optional)

Color Map Motion Retarget VAT

🎬 What is VAT?

Vertex Animation Textures encode mesh animation directly into image textures, shifting computation from CPU to GPU.

Each pixel= 1 vertex at 1 frame
R channel= X position
G channel= Y position
B channel= Z position

📈 PLY vs VAT

Metric PLY VAT
Load30s–3min<1s
Size0.7–2.5GB15–50MB
FPS~1560+
75×
load
50×
size
fps
GPU
native

🔢 16-bit Precision

8-bit PNG (0-255) lacks precision. Split 16-bit into high/low textures:

# Python encode
norm = (pos - min) / (max - min)
enc16 = (norm * 65535).astype(np.uint16)
high_8 = (enc16 >> 8).astype(np.uint8)
low_8 = (enc16 & 0xFF).astype(np.uint8)
// GLSL vertex shader decode
vec3 hi = texture2D(highTex, uv).rgb * 255.0;
vec3 lo = texture2D(lowTex, uv).rgb * 255.0;
vec3 n = (hi * 256.0 + lo) / 65535.0;
return mix(minBounds, maxBounds, n);

📁 Output Files

motion_name/
├── position_high.png
├── position_low.png
├── color_texture.png
└── metadata.json

⚡ Automatic Chunking

Mobile GPUs limit textures to 4096×4096. With 10,475 vertices → max 1,365 frames per texture. Long motions auto-split into seamless chunks.

VAT Playback Demo
# Convert with auto-chunking
python convert_vat_chunked.py \\
    moyo_frames/ \\
    -o vat_universal/ \\
    -j 8  # 8 parallel workers
10 / 14

🎉 Final Result: Animated SMPL-X Avatar

📷→🧍
Photos to Avatar
Multi-view reconstruction
🎨
Color Transfer
COLMAP → SMPL-X vertices
💃
Motion Ready
Any MoCap dataset
🌐
Web Playback
60fps via VAT

✅ What This Pipeline Produces

  • • SMPL-X parametric mesh (10,475 vertices)
  • • Vertex colors from COLMAP reconstruction
  • • Fully rigged and animatable body model
  • • Compatible with any SMPL-X motion data

🔮 Next Step: True Photorealism

  • • Transfer skinning weights to COLMAP mesh
  • • Preserve full geometric detail (millions of verts)
  • • UV-mapped texture instead of vertex colors
11 / 14

🚀 Advancement: Hybrid Mesh Animation

Transfer SMPL-X skinning weights to original COLMAP meshes — millions of vertices with full geometric detail, now animatable!

SMPL-X Only
10,475 vertices • Vertex colors • Fast rendering
Parametric 83× faster
🌟 Hybrid Model
COLMAP mesh + SMPL-X skeleton • Full detail
High-res Animatable
Side-by-Side Comparison
SMPL-X (blue) vs Hybrid (original colors)
Validation

Key Innovation: Weight Transfer

  • Barycentric interpolation of LBS weights from SMPL-X to COLMAP vertices
  • • Local coordinate frame transformation preserves geometric detail
  • • Distance thresholds handle background geometry
  • • GPU acceleration via PyTorch3D for 1M+ vertices

Technical Approach

# For each COLMAP vertex:
# 1. Find nearest SMPL-X triangle
# 2. Compute barycentric coordinates
# 3. Interpolate blend weights
weights_colmap = bary_interp(smplx_weights, 
                              smplx_faces, 
                              colmap_verts)
12 / 14

Hybrid Models: Additional Examples

SMPL-X Only (Subject 2)
10,475 vertices • Fast but low detail
🌟 Hybrid Model (Subject 2)
Full COLMAP detail preserved
Comparison (Subject 2)
Notice clothing detail preservation
Result: True photorealistic avatars with millions of vertices, fully animatable with any SMPL-X motion sequence.
13 / 14

Pipeline Summary & Next Steps

📋 Complete Pipeline

1
📷 Capture — 50-200 photos, T-pose
2
🏗️ COLMAP — SfM → Dense → Mesh
3
🧹 Preprocess — Clean, simplify, scale
4
🎯 SMPL-X Fit — Chamfer optimization
5
🎨 Color Map — Persistent k-NN
6
💃 Motion — MOYO/AMASS retargeting
7
🌐 VAT — Web-ready (optional)
✅ Output: Animated Avatar
10,475 vertices • vertex colors • 60fps

⚠️ Common Issues & Solutions

❌ SMPL-X arms twisted

✅ Match preset to scan pose

❌ Wrong colors on posed mesh

✅ Use persistent color map from fitted pose

❌ Avatar teleports in animation

✅ Subtract first frame translation

❌ Mesh upside down / rotated

✅ Apply Y-flip for COLMAP→SMPL-X

❌ COLMAP reconstruction sparse

✅ More photos, 70%+ overlap

💡 Key Insights

Pose initialization matters

Match preset to actual scan pose

Persistent color mapping

Establish once, reuse for all poses

Coordinate systems

COLMAP Y-down ↔ SMPL-X Y-up

Translation centering

Relative motion for seamless loops

🔮 Future Directions

  • • Real-time LBS (500-800× storage reduction)
  • • Neural rendering (NeRF, Gaussian Splatting)
  • • Multi-camera professional capture
  • • Physics-based cloth simulation

🙋

Questions?

14 / 14
Press ESC or click outside to close • Click video for controls