Lecture 03.2 – Procrustes Alignment

Introduction

Procrustes alignment is a fundamental technique for optimally aligning shapes that serves as the foundation for more complex shape analysis and registration methods. Named after a figure in Greek mythology who made his victims fit his bed by either stretching their limbs or cutting them off, Procrustes alignment involves finding a rigid transformation (and optionally a scale factor) that optimally aligns a source shape to a target shape.

This lecture introduces the mathematical theory behind Procrustes alignment, including its formulation as an optimization problem and its closed-form solution using Singular Value Decomposition (SVD). We provide a detailed derivation of the solution, discuss how to handle reflections and scaling, and illustrate applications in shape matching and motion capture.

Goal: Learning a Model of Pose and Shape

Our ultimate goal is to build a statistical model that can represent and control both the pose and shape of human bodies. This model should be able to:

Control the shape (body proportions) while keeping the pose fixed
Control the pose (body position) while keeping the shape fixed
Vary both pose and shape simultaneously

To build such a model, we need to:

Collect scans of many people of different shapes
Collect scans of these people in different poses
Bring these different scans into correspondence

The Challenge of Registration

The scans collected are sets of unordered points. To analyze them statistically, we need to establish correspondence between them — meaning we need to know which vertex in one scan corresponds to which part of the body (e.g., shoulder, hand, face) across all scans.

This process of establishing correspondence is called registration. We need to align a template mesh to each scan in a way that preserves the anatomical meaning of each vertex. This alignment has two components:

Rigid alignment: Finding the optimal translation, rotation, and potentially scale that aligns two shapes
Non-rigid alignment: Deforming the shape to match the target more precisely

In this lecture, we focus on the first step: rigid alignment through Procrustes analysis.

Surface Representation: Mesh

Before diving into the solution, let’s review the mesh representation used for 3D surfaces:

A mesh consists of vertices (points) and faces (triangles)
The vertices are stored in a matrix V of size n×3, where each row represents a point in 3D space
The faces are stored in a matrix F of size m×3, where each row contains the indices of the three vertices forming a triangle
The surface is essentially a piecewise linear mapping from ℝ² to ℝ³

The Procrustes Alignment Problem: Mathematical Formulation

Given two sets of corresponding points:

\[X = \{x_i\}_{i=1}^{N} \quad\text{and}\quad Y = \{y_i\}_{i=1}^{N}\]

with \(x_i, y_i \in \mathbb{R}^3\), we wish to determine:

A rotation \(R \in SO(3)\) (a 3×3 orthogonal matrix with determinant 1)
A translation \(\mathbf{t} \in \mathbb{R}^3\)
Optionally, a uniform scale factor \(s > 0\)

Such that the sum of squared distances between corresponding points is minimized:

\[\min_{s,R,\mathbf{t}} \; \sum_{i=1}^{N} \| s\, R\, x_i + \mathbf{t} - y_i \|^2\]

When \(s=1\), the problem is the orthogonal Procrustes problem; when \(s\) is free, it is the similarity Procrustes problem.

Rigid Transformations

There are three basic transformations we’ll consider:

Translation: Moving all points by a fixed vector t

\[\mathbf{X}' = \mathbf{X} + \mathbf{t}\]
Rotation: Rotating all points around the origin

\[\mathbf{X}'^T = \mathbf{R} \cdot \mathbf{X}^T\]
Scaling: Uniformly scaling all points from the origin

\[\mathbf{X}' = s\mathbf{X}\]

Combined, a full rigid transformation with scaling is:

\[\mathbf{x}' = s\mathbf{R}\mathbf{x} + \mathbf{t}\]

Procrustes Alignment Solution

We can solve this optimization problem in a closed form through the following steps:

Decoupling Translation by Centroid Alignment

We first define the centroids of both point clouds:

\[\bar{x} = \frac{1}{N} \sum_{i=1}^{N} x_i, \qquad \bar{y} = \frac{1}{N} \sum_{i=1}^{N} y_i\]

Then, we re-center the points by subtracting their respective centroids:

\[x_i' = x_i - \bar{x}, \qquad y_i' = y_i - \bar{y}\]

Or in matrix form:

\[\bar{\mathbf{X}} = [\mathbf{x}_1 - \bar{\mathbf{x}}, \ldots, \mathbf{x}_n - \bar{\mathbf{x}}]^T\]

It can be shown that the optimal translation is:

\[\mathbf{t}^* = \bar{y} - s\,R\,\bar{x}\]

This reduces the problem to finding the optimal rotation (and scale) aligning the centered sets.

Optimal Rotation via SVD

With the centered point clouds, we now want to minimize:

\[\sum_{i=1}^{N} \| R\, x_i' - y_i' \|^2\]

Expanding this expression:

\[\sum_{i=1}^{N} \|x_i'\|^2 + \sum_{i=1}^{N} \|y_i'\|^2 - 2 \sum_{i=1}^{N} (R\,x_i')^T y_i'\]

Since the first two sums are independent of \(R\), minimizing the error is equivalent to maximizing:

\[\sum_{i=1}^{N} y_i'^T (R\, x_i') = \mathrm{trace}\Bigl(R \Bigl(\sum_{i=1}^{N} x_i' y_i'^T\Bigr)\Bigr)\]

We define the cross-covariance matrix:

\[H = \sum_{i=1}^{N} x_i'\,y_i'^T\]

Or in matrix form:

\[\bar{\mathbf{Y}}^T \bar{\mathbf{X}} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^T\]

We compute the Singular Value Decomposition (SVD) of H:

\[H = U\,\Sigma\,V^T\]

Then the optimal rotation is given by:

\[R^* = V\,U^T\]

Note: In some formulations, this is written as \(\mathbf{R} = \mathbf{U}\mathbf{V}^T\), but this depends on whether the cross-covariance matrix is defined as \(\bar{\mathbf{Y}}^T \bar{\mathbf{X}}\) or \(\bar{\mathbf{X}}^T \bar{\mathbf{Y}}\).

Reflection Adjustment

If \(\det(R^*) < 0\), we need to flip the sign of the last column of \(V\) (i.e., \(V[:,3] := -V[:,3]\)) and recompute \(R^* = V\,U^T\) to ensure that \(R^* \in SO(3)\). This handles the case where the optimal transformation would include a reflection, which we generally want to avoid in shape alignment.

Optimal Scale (Optional)

If scaling is allowed, the optimal scale factor is given by:

\[s^* = \frac{\mathrm{trace}(\Sigma)}{\sum_{i=1}^{N} \|x_i'\|^2} = \frac{\text{tr}(\mathbf{\Sigma})}{\|\bar{\mathbf{X}}\|_F^2}\]

And then we update the translation as:

\[\mathbf{t}^* = \bar{y} - s^*\,R^*\,\bar{x}\]

Complete Mathematical Derivation

Let’s derive these formulas step by step for a more detailed understanding:

Translation Derivation

Starting with the energy function:

\[E = \sum_{i} \|s\mathbf{R}\mathbf{x}_i + \mathbf{t} - \mathbf{y}_i\|^2\]

We expand it:

\[E = \sum_{i} (s\mathbf{R}\mathbf{x}_i + \mathbf{t} - \mathbf{y}_i)^T(s\mathbf{R}\mathbf{x}_i + \mathbf{t} - \mathbf{y}_i)\]

\[E = \sum_{i} s^2\mathbf{x}_i^T\mathbf{x}_i + \mathbf{t}^T\mathbf{t} + \mathbf{y}_i^T\mathbf{y}_i + 2s\mathbf{x}_i^T\mathbf{R}^T\mathbf{t} - 2s\mathbf{x}_i^T\mathbf{R}^T\mathbf{y}_i - 2\mathbf{t}^T\mathbf{y}_i\]

To minimize with respect to t, we take the derivative and set it to zero:

\[\frac{\partial E}{\partial \mathbf{t}} = 2N\mathbf{t} + 2s\mathbf{R}\sum_{i}\mathbf{x}_i - 2\sum_{i}\mathbf{y}_i = \mathbf{0}\]

Solving for t:

\[\mathbf{t} = \bar{\mathbf{y}} - s\mathbf{R}\bar{\mathbf{x}}\]

Rotation Derivation

When we substitute the optimal translation, the energy function simplifies to:

\[E = \sum_{i} \|s\mathbf{R}\bar{\mathbf{x}}_i - \bar{\mathbf{y}}_i\|^2\]

We can rewrite this in matrix form:

\[E = \|\mathbf{R}\bar{\mathbf{X}}^T - \bar{\mathbf{Y}}^T\|_F^2\]

Expanding:

\[E = \|\mathbf{R}\bar{\mathbf{X}}^T\|_F^2 + \|\bar{\mathbf{Y}}^T\|_F^2 - 2\langle\mathbf{R}\bar{\mathbf{X}}^T, \bar{\mathbf{Y}}^T\rangle_F\]

Since R is orthogonal, \(\|\mathbf{R}\bar{\mathbf{X}}^T\|_F^2 = \|\bar{\mathbf{X}}^T\|_F^2\), so:

\[E = \|\bar{\mathbf{X}}^T\|_F^2 + \|\bar{\mathbf{Y}}^T\|_F^2 - 2\langle\mathbf{R}, \bar{\mathbf{Y}}^T\bar{\mathbf{X}}\rangle_F\]

To minimize E, we need to maximize \(\langle\mathbf{R}, \bar{\mathbf{Y}}^T\bar{\mathbf{X}}\rangle_F\)

Let’s compute the SVD of the cross-covariance matrix:

\[\bar{\mathbf{Y}}^T\bar{\mathbf{X}} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^T\]

Then:

\[\langle\mathbf{R}, \bar{\mathbf{Y}}^T\bar{\mathbf{X}}\rangle_F = \langle\mathbf{R}, \mathbf{U}\mathbf{\Sigma}\mathbf{V}^T\rangle_F = \langle\mathbf{U}^T\mathbf{R}\mathbf{V}, \mathbf{\Sigma}\rangle_F\]

Let \(\mathbf{S} = \mathbf{U}^T\mathbf{R}\mathbf{V}\). Since U and V are orthogonal, S is also orthogonal. To maximize \(\langle\mathbf{S}, \mathbf{\Sigma}\rangle_F\) where \(\mathbf{\Sigma}\) is diagonal, the optimal S is the identity matrix.

Therefore:

\[\mathbf{I} = \mathbf{U}^T\mathbf{R}\mathbf{V}\]

Solving for R:

\[\mathbf{R} = \mathbf{U}\mathbf{V}^T\]

Scale Derivation

Given the optimal rotation, we minimize:

\[E = \sum_{i} (s^2\bar{\mathbf{x}}_i^T\bar{\mathbf{x}}_i - 2s\bar{\mathbf{x}}_i^T\mathbf{R}^T\bar{\mathbf{y}}_i)\]

Taking the derivative with respect to s and setting it to zero:

\[\frac{\partial E}{\partial s} = 2s\sum_{i}(\bar{\mathbf{x}}_i^T\bar{\mathbf{x}}_i) - 2\sum_{i}(\bar{\mathbf{x}}_i^T\mathbf{R}^T\bar{\mathbf{y}}_i) = 0\]

Solving for s:

\[s = \frac{\sum_{i}(\bar{\mathbf{x}}_i^T\mathbf{R}^T\bar{\mathbf{y}}_i)}{\sum_{i}(\bar{\mathbf{x}}_i^T\bar{\mathbf{x}}_i)} = \frac{\text{tr}(\bar{\mathbf{X}}\mathbf{R}^T\bar{\mathbf{Y}}^T)}{\|\bar{\mathbf{X}}\|_F^2}\]

Substituting the optimal R and using properties of the trace:

\[s = \frac{\text{tr}(\mathbf{V}\mathbf{U}^T\mathbf{U}\mathbf{\Sigma}\mathbf{V}^T)}{\|\bar{\mathbf{X}}\|_F^2} = \frac{\text{tr}(\mathbf{\Sigma})}{\|\bar{\mathbf{X}}\|_F^2}\]

Summary of Procrustes Alignment Algorithm

The Procrustes alignment algorithm provides a closed-form solution to find the optimal rigid transformation (with optional scaling) between two shapes with known correspondences:

Compute the centroids of both point clouds
Center both point clouds by subtracting their respective centroids
Compute the cross-covariance matrix of the centered point clouds
Perform SVD on the cross-covariance matrix
Compute the optimal rotation as R = VU^T (or UV^T depending on the formulation)
Check if det(R) < 0, and if so, negate the last column of V and recompute R
Compute the optimal scale factor (if scaling is allowed)
Compute the optimal translation as t = ȳ - sRx̄

Python Implementation Example

import numpy as np

def procrustes_align(X, Y, allow_scale=True):
    # X and Y are N x 3 arrays (each row is a point).
    X_mean = X.mean(axis=0)
    Y_mean = Y.mean(axis=0)
    Xc = X - X_mean
    Yc = Y - Y_mean

    # Compute cross-covariance matrix
    H = Xc.T @ Yc
    U, S, Vt = np.linalg.svd(H)
    R = Vt.T @ U.T
    # Correct for reflection if necessary
    if np.linalg.det(R) < 0:
        Vt[-1, :] *= -1
        R = Vt.T @ U.T

    if allow_scale:
        var_X = np.sum(Xc**2)
        s = np.sum(S) / var_X
    else:
        s = 1.0

    t = Y_mean - s * (X_mean @ R.T)
    return s, R, t

# Demonstration:
np.random.seed(42)
X = np.random.rand(10, 3)
# Create a random rotation
rand_mat = np.random.randn(3, 3)
U, _, Vt = np.linalg.svd(rand_mat)
R_true = U @ Vt
if np.linalg.det(R_true) < 0:
    R_true[:, -1] *= -1
s_true = 1.5
t_true = np.array([0.5, -0.2, 1.0])
Y = s_true * (X @ R_true.T) + t_true

s_est, R_est, t_est = procrustes_align(X, Y, allow_scale=True)
print("True Scale:", s_true, "Estimated Scale:", s_est)
print("Rotation difference (R_true - R_est):\n", R_true - R_est)
print("True Translation:", t_true, "Estimated Translation:", t_est)

When run, the differences should be close to numerical zero.

Practical Applications

Procrustes alignment has numerous applications in computer vision, medical imaging, and geometric morphometrics:

Statistical Shape Analysis: Aligns a collection of shapes to compute a mean shape and analyze shape variation
3D Model Registration: Establishes correspondences between 3D models
Motion Capture: Aligns captured markers to a skeletal template
Medical Imaging: Registers anatomical structures across different patients
Archaeological Reconstruction: Aligns fragmented artifacts

Interactive Visualization Ideas

An interactive tool for teaching and understanding Procrustes alignment might allow:

Dragging points in \(Y\) and dynamically re-computing the optimal transformation, visualizing how centroids shift, how the SVD-based rotation aligns the shapes, and how scale changes affect the overall alignment.
Animating the alignment process frame-by-frame for motion capture data.