2) Enforcing the preservation of topology

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt

# Mandatory imports...
import numpy as np
import torch
from torch import tensor
from torch.nn import Parameter, ParameterList
from torch.autograd import grad

# Custom modules:
from model import Model  # Optimization blackbox
from display import plot, train_and_display
from torch_complex import rot # Rotations in 2D

We've seen how to work with unlabeled segmentation masks encoded as measures $\alpha$ and $\beta$: the key idea here is to use a data fidelity term which is well-defined up to resampling.

In [2]:
def scal( α, f ) :
    "Scalar product between two vectors."
    return torch.dot( α.view(-1), f.view(-1) )

def sqdistances(x_i, y_j) :
    "Matrix of squared distances, C_ij = |x_i-y_j|^2."
    return ( (x_i.unsqueeze(1) - y_j.unsqueeze(0)) ** 2).sum(2)

def distances(x_i, y_j) :
    "Matrix of distances, C_ij = |x_i-y_j|^2."
    return (x_i.unsqueeze(1) - y_j.unsqueeze(0)).norm(p=2,dim=2)

def fidelity(α, β) :
    "Energy Distance between two sampled probability measures."
    α_i, x_i = α
    β_j, y_j = β
    K_xx = -distances(x_i, x_i)
    K_xy = -distances(x_i, y_j)
    K_yy = -distances(y_j, y_j)
    cost = .5*scal( α_i, K_xx @ α_i ) \
         -    scal( α_i, K_xy @ β_j ) \
         + .5*scal( β_j, K_yy @ β_j )
    return cost

Most often, we work with measures that have been rigidly aligned with each other:

In [3]:
class IsometricRegistration(Model) :
    "Find the optimal translation and rotation."
    def __init__(self, α) :
        "Defines the parameters of a rigid deformation of α."
        super(Model, self).__init__()
        self.α, self.x = α[0].detach(), α[1].detach()
        self.θ = Parameter(tensor(  0.     )) # Angle
        self.Ï„ = Parameter(tensor( [0.,0.] )) # Position

    def __call__(self, t=1.) :
        # At the moment, PyTorch does not support complex numbers...
        x_t = rot(self.x, t*self.θ) + t*self.τ
        return self.α, x_t 
    
    def cost(self, target) :
        "Returns a cost to optimize."
        return fidelity( self(), target)
In [4]:
from sampling import load_measure

α_0 = load_measure("data/hippo_a.png")
β   = load_measure("data/hippo_b.png")
print("Number of atoms : {} and {}.".format(len(α_0[0]), len(β[0])))

# In practice, we often quotient out affine deformations:
isom = IsometricRegistration(α_0)
isom.fit(β)
α = isom()
α = α[0].detach().requires_grad_(), α[1].detach().requires_grad_()

# Let's display everyone:
plt.figure(figsize=(10,10))
plot(β,   "blue",   alpha=.7)
plot(α_0, "purple", alpha=.2)
plot(α,   "red",    alpha=.7)
plt.axis("equal")
plt.axis([0,1,0,1])
plt.show()
Number of atoms : 252 and 182.
It  1:0.028. It  2:0.009. It  3:0.008. It  4:0.008. It  5:0.008. 
It  6:0.008. It  7:0.008. It  8:0.008. It  9:0.007. It 10:0.007. 
It 11:0.006. It 12:0.006. It 13:0.006. It 14:0.006. It 15:0.007. 
It 16:0.006. 
b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'

A) Linear models

Let us now present some weakly parametrized models that let us study the variability between $\alpha$ and $\beta$ in a continuous way. In the previous notebook, we've seen that the "free particles" setting could be implemented as follows - with control variables $(p_i)\in\mathbb{R}^{N\times 2}$:

In [5]:
class L2Registration(Model) :
    "Find the optimal mass positions."
    def __init__(self, α) :
        "Defines the parameters of a free deformation of α."
        super(Model, self).__init__()
        self.α, self.x = α[0].detach(), α[1].detach()
        self.p = Parameter(torch.zeros_like(self.x))

    def __call__(self, t=1.) :
        "Applies the model on the source point cloud."
        x_t =  self.x + t*self.p
        return self.α, x_t 
    
    def cost(self, target) :
        "Returns a cost to optimize."
        return fidelity( self(), target)
    
l2_reg = train_and_display(L2Registration, α, β)
It  1:0.006. It  2:0.005. It  3:0.001. It  4:0.001. It  5:0.000. 
It  6:0.000. It  7:0.000. It  8:0.000. It  9:0.000. It 10:0.000. 
It 11:0.000. It 12:0.000. It 13:0.000. It 14:0.000. It 15:0.000. 
It 16:0.000. It 17:0.000. It 18:0.000. It 19:0.000. It 20:0.000. 
It 21:0.000. It 22:0.000. It 23:0.000. It 24:0.000. It 25:0.000. 
It 26:0.000. It 27:0.000. It 28:0.000. It 29:0.000. It 30:0.000. 
It 31:0.000. It 32:0.000. It 33:0.000. It 34:0.000. It 35:0.000. 
It 36:0.000. 
b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'

Is this a desirable result?

Smoothing. Crucially, the "free particles" setting does not enforce any kind of smoothness on the deformation, and can induce tearings in the registration. Using a kernel norm as a fidelity, we thus observe left-out particles that lag behind because of the fidelity's vanishing gradients. Going further, we may alleviate this problem with Wasserstein-like fidelities... But even then, we may then observe mass splittings instead of, say, rotations when trying to register two star-shaped measures - try it with an "X" and a "+"!

To alleviate this problem, a simple idea is to smooth the displacement field. That is, to use a vector field

$$ \begin{align} v(x)~=~\sum_{i=1}^N k(x-x_i)\,p_i ~=~ \big(k\star \sum_{i=1}^N p_i\,\delta_{x_i}\big)(x) \end{align} $$

to move around our Dirac masses, with $k$ a blurring kernel function - e.g. a Gaussian.

Regularization. Keep in mind that if the Fourier transform of $k$ is positive, any sampled vector field $v$ may be generated through such linear combinations. To enforce some kind of prior, we thus have to introduce an additional regularization term to our cost function.

From a theoretical point of view, using a positive kernel, the best penalty term is the dual kernel norm

$$ \begin{align} \big\|\sum_{i=1}^N p_i\,\delta_{x_i}\big\|_k^2 ~=~ \big\langle \sum_{i=1}^N p_i\,\delta_{x_i}, k\star \sum_{i=1}^N p_i\,\delta_{x_i} \big\rangle ~=~\sum_{i,j=1}^N \langle p_i,p_j\rangle\,k(x_i-x_j) \end{align} $$

which can be rewritten as a Sobolev-like penalty on the vector field $v$: $$ \begin{align} \big\|\sum_{i=1}^N p_i\,\delta_{x_i}\big\|_k^2 ~=~\iint_{\omega\in\mathbb{R}^2} \frac{|\widehat{v}(\omega)|^2}{\widehat{k}(\omega)}\,\text{d}\omega ~=~ \|v\|_{k}^{*2}. \end{align} $$

As we strive to optimize a cost that reads

$$ \begin{align} \text{Cost}(p) ~~=~~ \lambda_1\,\text{Fidelity}\big(\sum \alpha_i \delta_{x_i+v_i}, \beta\big) ~+~ \lambda_2\,\big\|\sum p_i\,\delta_{x_i}\big\|_k^2, \end{align} $$

the algorithm will find a compromise between accuracy and stretching.

Kernel dot product. In practice, we use a sampled interpolator

$$ \begin{align} \Phi^k_p~:~ x=(x_i)\in\mathbb{R}^{N\times2} ~\mapsto~ x~+~v~=~ x~+~ K_{xx} p, \end{align} $$

where $K$ is the kernel matrix of the $x_i$'s and $p$ is the momentum field encoded as an N-by-2 tensor.

Exercise 1: Implement a (linear) kernel registration module.

In [6]:
run -i nt_solutions/measures_2/exo1
It  1:6.390. It  2:10.851. It  3:2.041. It  4:1.065. It  5:0.707. 
It  6:0.452. It  7:0.335. It  8:0.287. It  9:0.232. It 10:0.177. 
It 11:0.151. It 12:0.139. It 13:0.132. It 14:0.126. It 15:0.122. 
It 16:0.119. It 17:0.115. It 18:0.114. It 19:0.113. It 20:0.113. 
It 21:0.112. It 22:0.111. It 23:0.109. It 24:0.109. It 25:0.108. 
It 26:0.108. It 27:0.108. It 28:0.108. It 29:0.107. It 30:0.107. 
It 31:0.107. It 32:0.107. It 33:0.107. It 34:0.106. It 35:0.106. 
It 36:0.106. It 37:0.106. It 38:0.105. It 39:0.105. It 40:0.105. 
It 41:0.105. It 42:0.105. It 43:0.105. It 44:0.105. It 45:0.104. 
It 46:0.104. It 47:0.105. It 48:0.104. It 49:0.103. It 50:0.103. 
It 51:0.103. It 52:0.103. It 53:0.102. It 54:0.102. It 55:0.102. 
It 56:0.102. It 57:0.102. It 58:0.102. It 59:0.102. It 60:0.102. 
It 61:0.102. It 62:0.102. It 63:0.102. It 64:0.101. It 65:0.101. 
It 66:0.101. It 67:0.101. It 68:0.101. It 69:0.101. It 70:0.101. 
It 71:0.101. It 72:0.101. It 73:0.101. It 74:0.101. It 75:0.101. 
It 76:0.101. It 77:0.100. It 78:0.100. It 79:0.100. It 80:0.100. 
It 81:0.100. It 82:0.100. It 83:0.100. It 84:0.100. It 85:0.100. 
It 86:0.100. It 87:0.100. It 88:0.100. It 89:0.100. It 90:0.100. 
It 91:0.100. It 92:0.100. It 93:0.100. It 94:0.100. It 95:0.100. 
It 96:0.100. It 97:0.100. It 98:0.100. It 99:0.100. It100:0.100. 
It101:0.100. It102:0.100. It103:0.100. It104:0.100. 
b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
<Figure size 432x288 with 0 Axes>

Smooth linear deformations have been extensively studied in the 90's, especially with Bookstein's Thin Plate Spline kernel:

$$ \begin{align} \|v\|_{k}^{*2} ~~=~~ \iint_{\mathbb{R}^2} \|\partial^2_{xx} v(x)\|^2~ \text{d}x \end{align} $$

which allows us to retrieve affine deformations for free.

B) A Riemannian model

Limits of the linear model. Assuming that the model $\Phi(\alpha)$ is close enough to $\beta$, we may use the matching momentum $p$ to characterize the deformation $\alpha\rightarrow\beta$. Its $k$-norm can be computed through

$$ \begin{align} \big\|\sum_{i=1}^N p_i\,\delta_{x_i}\big\|_k ~~=~~ \sqrt{\langle p, K_{xx}p\rangle}_{\mathbb{R}^{N\times2}} \end{align} $$

and can be used as a "shape distance" $\text{d}_k(\alpha\rightarrow \beta)$ that penalizes tearings. Going further, we may compute the Fréchet mean of a population and perform kernel PCA on the momenta that link a mean shape to the observed samples.

Path distance in a space of measures. Unfortunately though, such an approach has a major drawback: The "kernel distance" is not symmetric. As it only makes use of the kernel matrix $K_{xx}$ computed on the source point cloud, it induces a bias which may be detrimental to statistical studies.

An elegant solution to this problem is to understand the kernel cost

$$ \langle p, K_{xx} p\rangle ~=~ \langle v, K_{xx}^{-1} v\rangle $$

as a Riemannian, infinitesimal metric that penalizes small deformations. We may then look for trajectories

$$ \begin{align} \alpha(\cdot)~:~t\in[0,1] ~\mapsto~ \alpha(t)~=~\sum_{i=1}^N \alpha_i\,\delta_{x_i(t)} \end{align} $$

such that $\alpha(0)=\alpha$ and minimize a cost

$$ \begin{align} \text{Cost}(\alpha(\cdot)) ~~=~~ \lambda_1\,\text{Fidelity}(\alpha(1),\beta) ~+~ \lambda_2\,\int_0^1 \underbrace{\big\langle \dot{x}(t), K_{x(t),x(t)}^{-1}\,\dot{x}(t)\big\rangle}_{\|\dot{x}(t)\|_{x(t)}^2}\,\text{d}t \end{align} $$

with a regularization matrix that evolves along the path.

Momentum field. In practice, inverting the smoothing matrix is an ill-posed problem. We may thus parameterize the problem through the momentum field $(p_i(t))\in\mathbb{R}^{N\times 2}$ and write

$$ \begin{align} \text{Cost}(\alpha(\cdot)) ~~=~~ \lambda_1\,\text{Fidelity}(\alpha(1),\beta) ~+~ \lambda_2\,\int_0^1 \big\langle p(t), K_{x(t),x(t)}\,p(t)\big\rangle\,\text{d}t, \end{align} $$

keeping in mind that $x(0)$ is given by $\alpha$ and that $\dot{x}=v=K_{x,x}p$

Exercise 2: Implement a path registration method, sampling the interval $[0,1]$ with 10 timesteps.

In [7]:
run -i nt_solutions/measures_2/exo2
It  1:6.390. It  2:2.067. It  3:1.256. It  4:0.893. It  5:0.482. 
It  6:0.326. It  7:0.258. It  8:0.232. It  9:0.175. It 10:0.156. 
It 11:0.146. It 12:0.139. It 13:0.133. It 14:0.122. It 15:0.122. 
It 16:0.119. It 17:0.116. It 18:0.114. It 19:0.111. It 20:0.111. 
It 21:0.110. It 22:0.109. It 23:0.108. It 24:0.108. It 25:0.107. 
It 26:0.106. It 27:0.105. It 28:0.104. It 29:0.103. It 30:0.103. 
It 31:0.102. It 32:0.102. It 33:0.102. It 34:0.101. It 35:0.101. 
It 36:0.100. It 37:0.100. It 38:0.100. It 39:0.100. It 40:0.100. 
It 41:0.099. It 42:0.099. It 43:0.099. It 44:0.098. It 45:0.098. 
It 46:0.098. It 47:0.097. It 48:0.097. It 49:0.097. It 50:0.097. 
It 51:0.097. It 52:0.096. It 53:0.096. It 54:0.096. It 55:0.096. 
It 56:0.096. It 57:0.096. It 58:0.096. It 59:0.096. It 60:0.096. 
It 61:0.096. It 62:0.096. It 63:0.096. It 64:0.096. It 65:0.096. 
It 66:0.095. It 67:0.095. It 68:0.095. It 69:0.095. It 70:0.095. 
It 71:0.095. It 72:0.095. It 73:0.095. It 74:0.095. It 75:0.095. 
It 76:0.095. It 77:0.095. It 78:0.095. It 79:0.095. It 80:0.095. 
It 81:0.095. It 82:0.095. It 83:0.095. It 84:0.095. It 85:0.095. 
It 86:0.095. It 87:0.095. It 88:0.095. It 89:0.095. It 90:0.095. 
It 91:0.095. It 92:0.095. It 93:0.095. It 94:0.095. It 95:0.095. 
It 96:0.094. It 97:0.094. It 98:0.094. It 99:0.094. It100:0.094. 
It101:0.094. It102:0.094. It103:0.094. It104:0.094. It105:0.094. 
It106:0.094. It107:0.094. It108:0.094. It109:0.094. It110:0.094. 
It111:0.094. It112:0.094. It113:0.094. It114:0.094. It115:0.094. 
It116:0.094. It117:0.094. It118:0.094. It119:0.094. It120:0.094. 
It121:0.094. 
b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'

C) Shooting in a Riemannian manifold

Riemannian geodesics. You may have guessed it already: the cost above involves the resolution of a shortest-path-finding problem in the Riemannian manifold of point clouds $x\in\mathbb{R}^{N\times 2}$ endowed with a metric

$$ \begin{align} \langle v, g_x v\rangle~=~\langle v, K_{x,x}^{-1} v\rangle. \end{align} $$

Fortunately, length-minimizing paths in a Riemannian manifold satisfy a simple ODE: the geodesic equation which enforces "local straightness". In the position-velocity coordinates $(x,v=\dot{x})$, it can be written using the (in)famous Christoffel symbols. In the position-momentum coordinates $(q=x, p=K_{x,x}^{-1}\dot{x})$, it can be simply written as the symplectic gradient flow of the kinetic energy.

If we define the Hamiltonian as the kinetic energy function

$$ \begin{align} \text{H}~:~(q,p)\in\mathbb{R}^{N\times 2} ~\mapsto~ \text{H}(q,p)~=~\tfrac{1}{2}\langle p, K_{q,q}\,p\rangle ~=~\tfrac{1}{2}\|v\|_q^2, \end{align} $$

a curve $x(t)$ is geodesic if and only if it flows along the symplectic gradient (i.e. the gradient rotated by "-90°") of the Hamiltonian in the phase space $(q,p)$:

$$ \begin{align} \left\{ \begin{array}{c} \dot{q}_t~=~+\tfrac{\partial \text{H}}{\partial p}(q_t,p_t) \\ \dot{p}_t~=~-\tfrac{\partial \text{H}}{\partial q}(q_t,p_t) \end{array} \right. \end{align} $$

Exponential mapping. In practice, this equation means that we can generalize the $\text{Exp}$ mapping defined on the sphere to generic Riemannian manifolds: the routine below takes as input a point cloud $q$ and a momentum field $p$, and computes to time $t=1$ the unique solution $(q_t)$ of the geodesic equation such that

$$ \begin{align} q_0~=~ q ~~~\text{and}~~~ \dot{q}_0~=~v~=~K_{q,q}p. \end{align} $$

In [8]:
def K_xx(x_i, σ = .1) :
    return (-sqdistances(x_i,x_i)/σ**2 ).exp()

def H(q, p) :
    return .5 * scal( p, K_xx(q) @ p )

def Exp(q,p) :
    "Simplistic Euler solver for Hamilton's geodesic equation."
    for _ in range(10) :
        [dq, dp] = grad(H(q,p), [q,p], create_graph=True)
        q = q + .1* dp
        p = p - .1* dq
    return q

Geodesic shooting. Thanks to the exponential map that parameterizes geodesics in our shape manifold, we may optimize in the small vector space of momenta at time $t=0$ instead of using the full (redundant) space of trajectories.

In practice, the geodesic shooting algorithm (from optimal control theory) is thus all about minimizing a cost

$$ \begin{align} \text{Cost}(p_0) ~~=~~ \lambda_1\,\text{Fidelity}\big(\,\text{Exp}(q_0,p_0),\beta\,\big) ~+~ \lambda_2\,\langle p_0, K_{q_0,q_0}\,p_0\rangle, \end{align} $$

where $q_0$ encodes the source measure $\alpha$.

Exercise 3: Implement a Riemannian shooting registration method. Why is it close to a Riemannian $\text{Log}$ function?

In [9]:
run -i nt_solutions/measures_2/exo3
It  1:6.390. It  2:6.278. It  3:11.601. It  4:4.328. It  5:3.648. 
It  6:2.605. It  7:2.739. It  8:1.508. It  9:1.071. It 10:0.647. 
It 11:0.445. It 12:0.371. It 13:0.303. It 14:0.240. It 15:0.204. 
It 16:0.176. It 17:0.178. It 18:0.167. It 19:0.160. It 20:0.154. 
It 21:0.148. It 22:0.143. It 23:0.137. It 24:0.134. It 25:0.129. 
It 26:0.129. It 27:0.126. It 28:0.122. It 29:0.120. It 30:0.117. 
It 31:0.116. It 32:0.115. It 33:0.114. It 34:0.113. It 35:0.112. 
It 36:0.111. It 37:0.110. It 38:0.110. It 39:0.109. It 40:0.108. 
It 41:0.107. It 42:0.106. It 43:0.105. It 44:0.105. It 45:0.104. 
It 46:0.103. It 47:0.103. It 48:0.102. It 49:0.101. It 50:0.100. 
It 51:0.100. It 52:0.100. It 53:0.099. It 54:0.099. It 55:0.099. 
It 56:0.099. It 57:0.098. It 58:0.098. It 59:0.098. It 60:0.097. 
It 61:0.097. It 62:0.097. It 63:0.096. It 64:0.096. It 65:0.096. 
It 66:0.096. It 67:0.096. It 68:0.096. It 69:0.096. It 70:0.096. 
It 71:0.096. It 72:0.095. It 73:0.095. It 74:0.095. It 75:0.095. 
It 76:0.095. It 77:0.095. It 78:0.095. It 79:0.095. It 80:0.095. 
It 81:0.095. It 82:0.095. It 83:0.095. It 84:0.095. It 85:0.095. 
It 86:0.095. It 87:0.095. It 88:0.095. It 89:0.095. It 90:0.095. 
It 91:0.095. It 92:0.095. It 93:0.095. It 94:0.095. It 95:0.095. 
It 96:0.094. It 97:0.094. It 98:0.094. It 99:0.094. It100:0.094. 
It101:0.094. It102:0.094. It103:0.094. It104:0.094. It105:0.094. 
It106:0.094. It107:0.094. It108:0.094. It109:0.094. It110:0.094. 
It111:0.094. It112:0.094. It113:0.094. It114:0.094. It115:0.094. 
It116:0.094. It117:0.094. It118:0.094. It119:0.094. It120:0.094. 
It121:0.094. It122:0.094. It123:0.094. It124:0.094. It125:0.094. 
It126:0.094. It127:0.094. It128:0.094. It129:0.094. It130:0.094. 
It131:0.094. It132:0.094. It133:0.094. It134:0.094. It135:0.094. 
It136:0.094. It137:0.094. It138:0.094. It139:0.094. It140:0.094. 
It141:0.094. It142:0.094. It143:0.094. It144:0.094. It145:0.094. 
It146:0.094. It147:0.094. It148:0.094. It149:0.094. It150:0.094. 
It151:0.094. It152:0.094. 
b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'

Exercise 4: Playing around with the kernel function and the densities, investigate the properties of this algorithm.

Large deformations and diffeomorphisms. In this notebook, we've seen that we could endow arbitrary shape spaces with tearing-adverse Riemannian structures, be they flat (linear) or curved (the generic case). Crucially, Hamilton's geodesic equation lets us implement the $\text{Exp}$ operator in terms of the cometric

$$ \begin{align} K_{q,q}~=~g_q^{-1} \end{align} $$

as we generalize the statistical analysis that we performed in Kendall's shape space.

Studied in the last 15 years, the LDDMM framework is all about using a kernel matrix as a cometric on spaces of point clouds. Remarkably, the exponential mappings can then be extended into diffeomorphisms of the ambient space, whose smoothness is controlled by the kernel norm of the shooting momentum.

In practice. These methods only make use of a convolution dot product

$$ \begin{align} k\star\cdot~:~ p=\sum p_i\,\delta_{x_i}~\mapsto~ \big( k\star p: x\mapsto \sum k(x-x_i)\,p_i \big) \end{align} $$

Unsurprisingly, they can thus be accelerated using appropriate numerical schemes. Let us mention Flash for a Fourier domain implementation or the KeOps library for fast convolutions on point clouds with the GPU: combined with clever dimensionality reductions, these toolboxes let us routinely work with 3D scans or meshes with ~100,000 vertices.

Conclusion. Going further, the method presented here has two main advantages:

  • It enforces a diffeomorphic prior.
  • It endows the shape space with a clean Riemannian structure.

On the other hand, it also has two majors drawbacks:

  • The translation-invariant kernel cometrics do not make much sense from a physiological point of view.
  • Even with a proper Riemannian shooting routine, using an L-BFGS descent every time we need to compute a Riemannian $\text{Log}$ is very impractical.

In years to come, thanks to the development of PyTorch and TensorFlow, we thus expect to see major advances in the field. For instance, memoizing the $\text{Log}$ mapping in a CNN autoencoder is a promising approach for brain registration. All things considered, here are the two main points that you should keep in mind from these workshop sessions:

  • Working on a shape manifold is doable, and lets us generalize most standard statistical routines in a principled way.
  • Hamilton's exponential map lets you encode a Riemannian prior into your programs.

Which (co)metric should you choose? LDDMM's convolution cometrics are fully parallel, and are thus the cheapest available on point clouds and images; in a medical setting, learning relevant metrics will be a major research topic in the coming years.