Lecture 4.1: Iterative Closest Point

Introduction to Shape Alignment and Registration

In computer graphics, computer vision, and 3D modeling, we often need to align shapes, point clouds, or meshes to each other. The Iterative Closest Point (ICP) algorithm is a foundational method for solving this alignment problem, especially when the correspondences between points are unknown.

In this lecture, we’ll explore how to align shapes using transformations beyond simple rigid alignment, particularly focusing on non-rigid alignment where the shape might deform. We’ll first review the Procrustes algorithm (covered in the previous lecture) that provides optimal rigid alignment when correspondences are known, then move to the more general problem where correspondences are unknown and must be estimated iteratively.

The Registration Problem

Given two shapes represented as point sets:

Source shape: \(X = \{x_1, x_2, \ldots, x_n\} \subset \mathbb{R}^3\)
Target shape: \(Y = \{y_1, y_2, \ldots, y_m\} \subset \mathbb{R}^3\)

We want to find a transformation \(f\) that maps points from \(X\) to best align with \(Y\).

For rigid alignment, this transformation is parameterized by:

A rotation matrix \(\mathbf{R} \in SO(3)\)
A translation vector \(\mathbf{t} \in \mathbb{R}^3\)
Optionally, a uniform scale factor \(s > 0\)

The fundamental problem in registration has two interrelated components:

Finding the correct correspondences between points in the source and target shapes
Computing the optimal transformation that aligns the source to the target

Review: Procrustes Analysis

If we already know the correspondences between points in both shapes (i.e., for each \(x_i\) in the source, we know the corresponding \(y_i\) in the target), then we can find the optimal rigid alignment using the Procrustes algorithm.

The objective function to minimize is:

\[E(\mathbf{R}, \mathbf{t}, s) = \sum_{i=1}^{n} \|s\mathbf{R}x_i + \mathbf{t} - y_i\|^2\]

where \(s\) is the scale factor, \(\mathbf{R}\) is the rotation matrix, and \(\mathbf{t}\) is the translation vector.

The solution is as follows:

Compute the centroids of both point sets:

\[\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i, \quad \bar{y} = \frac{1}{n}\sum_{i=1}^{n} y_i\]
Center both point sets by subtracting their respective centroids:

\[x_i' = x_i - \bar{x}, \quad y_i' = y_i - \bar{y}\]
Compute the cross-covariance matrix:

\[\mathbf{H} = \sum_{i=1}^{n} x_i' (y_i')^T\]
Compute the SVD of the cross-covariance matrix:

\[\mathbf{H} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^T\]
Determine the optimal rotation matrix:

\[\mathbf{R} = \mathbf{V}\mathbf{U}^T\]
If we need to ensure a proper rotation (determinant = +1), we check:
- If \(\det(\mathbf{R}) = 1\), the rotation is already proper
- If \(\det(\mathbf{R}) = -1\), we compute:
  
  \[\begin{split}\mathbf{R} = \mathbf{V} \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & \det(\mathbf{V}\mathbf{U}^T) \end{pmatrix} \mathbf{U}^T\end{split}\]
The optimal scale (if desired) is:

\[s = \frac{\text{trace}(\mathbf{\Sigma})}{\sum_{i=1}^{n} \|x_i'\|^2}\]
The optimal translation is:

\[\mathbf{t} = \bar{y} - s\mathbf{R}\bar{x}\]

This solution is closed-form and globally optimal when correspondences are known.

Problem: Unknown Correspondences

In practice, however, correspondences between shapes are typically unknown. This creates a chicken-and-egg problem:

If we knew the correct transformation, we could easily determine the correspondences
If we knew the correct correspondences, we could easily compute the optimal transformation using Procrustes

The fundamental problem is that we need to jointly optimize over both the transformation parameters and the point correspondences:

\[\min_{f, C} E(f, C) = \min_{f, C} \sum_{i} \|f(x_i) - y_{c(i)}\|^2\]

where \(C = \{c(i)\}\) represents the set of correspondences that map each point \(x_i\) in the source to a point \(y_{c(i)}\) in the target, and \(f\) is the transformation function.

The Iterative Closest Point (ICP) Algorithm

The key insight of the ICP algorithm, introduced independently by Besl & McKay (1992) and Chen & Medioni (1992), is to solve this problem by alternating between:

Estimating the correspondences for a fixed transformation
Optimizing the transformation for fixed correspondences

Basic ICP Algorithm

Initialize the transformation parameters:
- Typically set \(\mathbf{R}^0 = \mathbf{I}\) (identity matrix)
- \(\mathbf{t}^0 = \frac{\sum y_i}{N} - \frac{\sum x_i}{N}\) (align centroids)
- \(s^0 = 1\) (unit scale)
Iterate until convergence:
1. Compute correspondences: For each point \(y_i\) in the target shape, find the closest point in the transformed source shape:
  
  \[x_i^{j+1} = \arg\min_{x \in X} \|f^j(x) - y_i\|^2\]
2. Compute optimal transformation: Using these correspondences, solve for the optimal transformation parameters (using Procrustes):
  
  \[f^{j+1} = \arg\min_{f} \sum_{i} \|f(x_i^{j+1}) - y_i\|^2\]
3. Apply the transformation to update the source shape
4. Check for convergence: Terminate if the change in transformation parameters or alignment error is below a threshold

The algorithm monotonically decreases the alignment error and is guaranteed to converge to a local minimum. However, it may not find the global optimum, especially if the initial alignment is poor.

Computational Considerations

Closest Point Computation

Computing closest points naively is an \(O(n \times m)\) operation, which can be prohibitively expensive for large point sets. Efficient implementations use:

k-d trees: Spatial data structures that reduce the average time complexity of nearest neighbor search to \(O(\log m)\) per query
Random sampling: Instead of using all points, sample a subset for correspondence estimation
Distance threshold: Ignore correspondences beyond a maximum distance to handle partial overlaps

Convergence and Local Minima

ICP converges to a local minimum but may not find the global optimum. Strategies to avoid poor local minima include:

Better initialization (e.g., using centroid alignment or feature matching)
Multi-resolution approaches (start with coarse alignment, then refine)
Annealing strategies (gradually decrease the matching distance threshold)
Running multiple ICP instances from different initial positions

Point-to-Point vs. Point-to-Plane ICP

Point-to-Point ICP

The standard ICP algorithm minimizes the point-to-point distance between corresponding points. The error function is:

\[E(\mathbf{R}, \mathbf{t}, s) = \sum_{i} \|s\mathbf{R}x_i + \mathbf{t} - y_{c(i)}\|^2\]

This has a closed-form solution via Procrustes analysis, but can lead to slow convergence, especially on surfaces with high curvature.

Point-to-Plane ICP

Chen & Medioni (1992) proposed an alternative error metric that minimizes the distance from each source point to the tangent plane of its corresponding target point:

\[E(\mathbf{R}, \mathbf{t}, s) = \sum_{i} ((s\mathbf{R}x_i + \mathbf{t} - y_{c(i)}) \cdot \mathbf{n}_{c(i)})^2\]

where \(\mathbf{n}_{c(i)}\) is the surface normal at target point \(y_{c(i)}\).

This formulation allows points to “slide” along the tangent planes, leading to:

Faster convergence, especially on smooth surfaces
More natural alignment along surfaces
Better handling of sampling differences between the shapes

The point-to-plane error function is non-linear in the transformation parameters and doesn’t have a closed-form solution. It’s typically solved by linearizing the rotation (for small rotations) and using a single Gauss-Newton step per iteration.

Gradient-based ICP for Non-Rigid Registration

For non-rigid alignment, the transformation \(f\) goes beyond rigid (rotation, translation, scale) to include more complex deformations. The ICP framework can be generalized to non-rigid deformations by:

Using a more general parameterization of the transformation \(f\)
Applying gradient-based optimization instead of closed-form Procrustes

Gradient-based ICP Algorithm

Initialize transformation parameters
Iterate until convergence:
1. Compute correspondences for the current transformation
2. Compute the gradient of the error with respect to transformation parameters:
  
  \[g^{j+1} = \nabla E(f^j) = \nabla \sum_{i} \|f^j(x_i^{j+1}) - y_i\|^2\]
3. Update the transformation using gradient-based optimization:
  
  \[f^{j+1} = k_{step}(g^{0...j+1}, f^{0...j})\]
  
  For simple gradient descent, this would be:
  
  \[f^{j+1} = f^j - \alpha g^{j+1}\]
  
  Other optimization methods include Levenberg-Marquardt, BFGS, and dogleg.
4. Check for convergence

Advantages of Gradient-based ICP

More general framework that can incorporate different transformation models
Can include additional terms in the energy function (e.g., regularization)
Leverages existing optimization libraries for solving non-linear least squares problems

Computing Gradients

For complex transformation models, deriving and implementing the gradient can be tedious. Modern implementations often use:

Automatic differentiation: Libraries that automatically compute derivatives through the chain rule
Numerical differentiation: Approximating derivatives using finite differences

Typically, one can treat the code as a computational graph and let autodiff tools handle the gradient computation.

Improving ICP’s Robustness

Several strategies improve ICP’s robustness to outliers, noise, and partial overlaps:

Data Association Direction

Always find correspondences from the target to the source, not vice versa. This helps ensure that the data term is proper (explaining the observed data given the model parameters).

Robust Cost Functions

Replace the squared error with robust estimators that are less sensitive to outliers:

Huber loss: Quadratic for small errors, linear for large ones
Tukey biweight: Saturates for large errors, effectively ignoring outliers
L1 norm: Uses absolute error rather than squared error

Trimmed ICP

The Trimmed ICP algorithm (TrICP) sorts correspondences by distance and only uses the closest X% (where X is the estimated overlap percentage) for computing the transformation.

RANSAC-based Approaches

Random Sample Consensus (RANSAC) can be integrated with ICP by:

Randomly selecting minimal sets of correspondences
Computing candidate transformations
Evaluating how many other points agree with each candidate
Selecting the transformation with the most inliers

Additional Information

Using additional information beyond positions can improve matching:

Surface normals: Points with similar normals are more likely to correspond
Color/texture: In RGB-D data, color can guide correspondence
Local shape descriptors: Geometric features that describe local shape properties

Point-to-Surface Distance

A significant improvement on point-to-plane ICP is using a true point-to-surface distance. This:

Allows correspondences to slide along the entire surface, not just on planes
Provides smoother convergence
Often yields better results with fewer iterations

The key difference is that in point-to-plane, we’re locked to vertex correspondences that can jump discontinuously from iteration to iteration. With point-to-surface, we can find the true closest point on the surface (which might lie on a triangle face, not at a vertex).

ICP Variants and Extensions

Generalized ICP (GICP)

Segal et al. (2009) introduced Generalized ICP, which models each point with a local covariance matrix representing the uncertainty or “shape” of its neighborhood. The distance between points becomes a Mahalanobis distance:

\[d_{ij}^2 = (f(x_i) - y_j)^T (C_{x_i} + f(C_{y_j})f^T)^{-1} (f(x_i) - y_j)\]

where \(C_{x_i}\) and \(C_{y_j}\) are the covariance matrices. This unifies point-to-point and point-to-plane ICP in a probabilistic framework.

EM-ICP and Probabilistic Approaches

Expectation-Maximization ICP (EM-ICP) treats correspondences as latent variables in a probabilistic model. Instead of hard assignments, it computes for each point a probability distribution over possible matches. This soft matching is more robust to noise and outliers.

Coherent Point Drift (CPD)

CPD by Myronenko & Song (2010) models one point set as Gaussian Mixture Model centroids and the other as data points. It enforces motion coherence (points move together in a smooth way) and handles both rigid and non-rigid registration.

Multi-Scale Approaches

To avoid local minima and improve efficiency:

Start with heavily downsampled versions of both shapes
Perform ICP at the coarse level
Progressively increase resolution, using the previous level’s result as initialization

This is similar to multi-resolution approaches in image processing and helps escape local minima.

Applications of ICP

ICP and its variants are widely used in:

3D Scan Registration: Aligning multiple scans to create complete 3D models
SLAM (Simultaneous Localization and Mapping): Aligning consecutive scans for robot navigation
Medical Imaging: Registration of medical scans (CT, MRI) for analysis and surgery planning
Statistical Shape Analysis: Building statistical shape models by aligning populations of shapes
Motion Capture: Tracking articulated objects over time
Virtual and Augmented Reality: Aligning virtual content with the real world

In particular, ICP is the workhorse algorithm for building statistical human body models like SMPL, which align a template mesh to many 3D scans of people to learn a parameterized model of human shape and pose.

Implementing ICP

Efficient Python Implementation

For a Python implementation, key considerations include:

Use optimized libraries for nearest neighbor search:

from scipy.spatial import cKDTree

# Build k-d tree on target points (once)
tree = cKDTree(target_points)

# Query for each transformed source point
distances, indices = tree.query(transformed_source_points, k=1)

Use vectorized operations for transformations:

# Apply transform to all source points
transformed_points = np.dot(source_points, R.T) + t

Compute the covariance matrix efficiently:

# Center points
source_centered = source_points - source_centroid
target_centered = target_points - target_centroid

# Compute cross-covariance
H = source_centered.T @ target_centered

Use robust error functions:

# Huber loss for robust weighting
def huber_weights(residuals, delta=1.0):
    weights = np.ones_like(residuals)
    mask = np.abs(residuals) >= delta
    weights[mask] = delta / np.abs(residuals[mask])
    return weights

Practical Tips

Initialization: Always initialize by aligning centroids
Outlier Handling: Use a distance threshold to ignore outliers
Convergence Criteria: Use a combination of: - Maximum number of iterations - Minimum change in transformation parameters - Minimum change in error
Multi-Resolution: Start with downsampled point clouds, progressively refine
Point-to-Plane: When surface normals are available, prefer point-to-plane over point-to-point

Conclusion

The Iterative Closest Point algorithm is a versatile framework for shape alignment that:

Provides a solution to the chicken-and-egg problem of unknown correspondences
Can be extended to various transformation models (rigid, similarity, affine, non-rigid)
Has many variants to improve robustness, efficiency, and accuracy

Despite being proposed in the early 1990s, ICP remains a fundamental algorithm in 3D computer vision and graphics, with ongoing research and applications. Understanding its principles and limitations is essential for working with 3D data.

In the next lecture, we will explore the SMPL body model, which leverages the registration techniques we’ve discussed to create a statistical model of human shape and pose.