# 2) Enforcing the preservation of topology¶

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt

# Mandatory imports...
import numpy as np
import torch
from torch import tensor
from torch.nn import Parameter, ParameterList

# Custom modules:
from model import Model  # Optimization blackbox
from display import plot, train_and_display
from torch_complex import rot # Rotations in 2D


We've seen how to work with unlabeled segmentation masks encoded as measures $\alpha$ and $\beta$: the key idea here is to use a data fidelity term which is well-defined up to resampling.

In [2]:
def scal( Î±, f ) :
"Scalar product between two vectors."

def sqdistances(x_i, y_j) :
"Matrix of squared distances, C_ij = |x_i-y_j|^2."
return ( (x_i.unsqueeze(1) - y_j.unsqueeze(0)) ** 2).sum(2)

def distances(x_i, y_j) :
"Matrix of distances, C_ij = |x_i-y_j|^2."
return (x_i.unsqueeze(1) - y_j.unsqueeze(0)).norm(p=2,dim=2)

def fidelity(Î±, Î²) :
"Energy Distance between two sampled probability measures."
Î±_i, x_i = Î±
Î²_j, y_j = Î²
K_xx = -distances(x_i, x_i)
K_xy = -distances(x_i, y_j)
K_yy = -distances(y_j, y_j)
cost = .5*scal( Î±_i, K_xx @ Î±_i ) \
-    scal( Î±_i, K_xy @ Î²_j ) \
+ .5*scal( Î²_j, K_yy @ Î²_j )
return cost


Most often, we work with measures that have been rigidly aligned with each other:

In [3]:
class IsometricRegistration(Model) :
"Find the optimal translation and rotation."
def __init__(self, Î±) :
"Defines the parameters of a rigid deformation of Î±."
super(Model, self).__init__()
self.Î±, self.x = Î±[0].detach(), Î±[1].detach()
self.Î¸ = Parameter(tensor(  0.     )) # Angle
self.Ï„ = Parameter(tensor( [0.,0.] )) # Position

def __call__(self, t=1.) :
# At the moment, PyTorch does not support complex numbers...
x_t = rot(self.x, t*self.Î¸) + t*self.Ï„
return self.Î±, x_t

def cost(self, target) :
"Returns a cost to optimize."
return fidelity( self(), target)

In [4]:
from sampling import load_measure

print("Number of atoms : {} and {}.".format(len(Î±_0[0]), len(Î²[0])))

# In practice, we often quotient out affine deformations:
isom = IsometricRegistration(Î±_0)
isom.fit(Î²)
Î± = isom()

# Let's display everyone:
plt.figure(figsize=(10,10))
plot(Î²,   "blue",   alpha=.7)
plot(Î±_0, "purple", alpha=.2)
plot(Î±,   "red",    alpha=.7)
plt.axis("equal")
plt.axis([0,1,0,1])
plt.show()

Number of atoms : 252 and 182.
It  1:0.028. It  2:0.009. It  3:0.008. It  4:0.008. It  5:0.008.
It  6:0.008. It  7:0.008. It  8:0.008. It  9:0.007. It 10:0.007.
It 11:0.006. It 12:0.006. It 13:0.006. It 14:0.006. It 15:0.007.
It 16:0.006.
b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'


## A) Linear models¶

Let us now present some weakly parametrized models that let us study the variability between $\alpha$ and $\beta$ in a continuous way. In the previous notebook, we've seen that the "free particles" setting could be implemented as follows - with control variables $(p_i)\in\mathbb{R}^{N\times 2}$:

In [5]:
class L2Registration(Model) :
"Find the optimal mass positions."
def __init__(self, Î±) :
"Defines the parameters of a free deformation of Î±."
super(Model, self).__init__()
self.Î±, self.x = Î±[0].detach(), Î±[1].detach()
self.p = Parameter(torch.zeros_like(self.x))

def __call__(self, t=1.) :
"Applies the model on the source point cloud."
x_t =  self.x + t*self.p
return self.Î±, x_t

def cost(self, target) :
"Returns a cost to optimize."
return fidelity( self(), target)

l2_reg = train_and_display(L2Registration, Î±, Î²)

It  1:0.006. It  2:0.005. It  3:0.001. It  4:0.001. It  5:0.000.
It  6:0.000. It  7:0.000. It  8:0.000. It  9:0.000. It 10:0.000.
It 11:0.000. It 12:0.000. It 13:0.000. It 14:0.000. It 15:0.000.
It 16:0.000. It 17:0.000. It 18:0.000. It 19:0.000. It 20:0.000.
It 21:0.000. It 22:0.000. It 23:0.000. It 24:0.000. It 25:0.000.
It 26:0.000. It 27:0.000. It 28:0.000. It 29:0.000. It 30:0.000.
It 31:0.000. It 32:0.000. It 33:0.000. It 34:0.000. It 35:0.000.
It 36:0.000.
b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'


Is this a desirable result?

Smoothing. Crucially, the "free particles" setting does not enforce any kind of smoothness on the deformation, and can induce tearings in the registration. Using a kernel norm as a fidelity, we thus observe left-out particles that lag behind because of the fidelity's vanishing gradients. Going further, we may alleviate this problem with Wasserstein-like fidelities... But even then, we may then observe mass splittings instead of, say, rotations when trying to register two star-shaped measures - try it with an "X" and a "+"!

To alleviate this problem, a simple idea is to smooth the displacement field. That is, to use a vector field

\begin{align} v(x)~=~\sum_{i=1}^N k(x-x_i)\,p_i ~=~ \big(k\star \sum_{i=1}^N p_i\,\delta_{x_i}\big)(x) \end{align}

to move around our Dirac masses, with $k$ a blurring kernel function - e.g. a Gaussian.

Regularization. Keep in mind that if the Fourier transform of $k$ is positive, any sampled vector field $v$ may be generated through such linear combinations. To enforce some kind of prior, we thus have to introduce an additional regularization term to our cost function.

From a theoretical point of view, using a positive kernel, the best penalty term is the dual kernel norm

\begin{align} \big\|\sum_{i=1}^N p_i\,\delta_{x_i}\big\|_k^2 ~=~ \big\langle \sum_{i=1}^N p_i\,\delta_{x_i}, k\star \sum_{i=1}^N p_i\,\delta_{x_i} \big\rangle ~=~\sum_{i,j=1}^N \langle p_i,p_j\rangle\,k(x_i-x_j) \end{align}

which can be rewritten as a Sobolev-like penalty on the vector field $v$: \begin{align} \big\|\sum_{i=1}^N p_i\,\delta_{x_i}\big\|_k^2 ~=~\iint_{\omega\in\mathbb{R}^2} \frac{|\widehat{v}(\omega)|^2}{\widehat{k}(\omega)}\,\text{d}\omega ~=~ \|v\|_{k}^{*2}. \end{align}

As we strive to optimize a cost that reads

\begin{align} \text{Cost}(p) ~~=~~ \lambda_1\,\text{Fidelity}\big(\sum \alpha_i \delta_{x_i+v_i}, \beta\big) ~+~ \lambda_2\,\big\|\sum p_i\,\delta_{x_i}\big\|_k^2, \end{align}

the algorithm will find a compromise between accuracy and stretching.

Kernel dot product. In practice, we use a sampled interpolator

\begin{align} \Phi^k_p~:~ x=(x_i)\in\mathbb{R}^{N\times2} ~\mapsto~ x~+~v~=~ x~+~ K_{xx} p, \end{align}

where $K$ is the kernel matrix of the $x_i$'s and $p$ is the momentum field encoded as an N-by-2 tensor.

Exercise 1: Implement a (linear) kernel registration module.

In [6]:
run -i nt_solutions/measures_2/exo1

It  1:6.390. It  2:10.851. It  3:2.041. It  4:1.065. It  5:0.707.
It  6:0.452. It  7:0.335. It  8:0.287. It  9:0.232. It 10:0.177.
It 11:0.151. It 12:0.139. It 13:0.132. It 14:0.126. It 15:0.122.
It 16:0.119. It 17:0.115. It 18:0.114. It 19:0.113. It 20:0.113.
It 21:0.112. It 22:0.111. It 23:0.109. It 24:0.109. It 25:0.108.
It 26:0.108. It 27:0.108. It 28:0.108. It 29:0.107. It 30:0.107.
It 31:0.107. It 32:0.107. It 33:0.107. It 34:0.106. It 35:0.106.
It 36:0.106. It 37:0.106. It 38:0.105. It 39:0.105. It 40:0.105.
It 41:0.105. It 42:0.105. It 43:0.105. It 44:0.105. It 45:0.104.
It 46:0.104. It 47:0.105. It 48:0.104. It 49:0.103. It 50:0.103.
It 51:0.103. It 52:0.103. It 53:0.102. It 54:0.102. It 55:0.102.
It 56:0.102. It 57:0.102. It 58:0.102. It 59:0.102. It 60:0.102.
It 61:0.102. It 62:0.102. It 63:0.102. It 64:0.101. It 65:0.101.
It 66:0.101. It 67:0.101. It 68:0.101. It 69:0.101. It 70:0.101.
It 71:0.101. It 72:0.101. It 73:0.101. It 74:0.101. It 75:0.101.
It 76:0.101. It 77:0.100. It 78:0.100. It 79:0.100. It 80:0.100.
It 81:0.100. It 82:0.100. It 83:0.100. It 84:0.100. It 85:0.100.
It 86:0.100. It 87:0.100. It 88:0.100. It 89:0.100. It 90:0.100.
It 91:0.100. It 92:0.100. It 93:0.100. It 94:0.100. It 95:0.100.
It 96:0.100. It 97:0.100. It 98:0.100. It 99:0.100. It100:0.100.
It101:0.100. It102:0.100. It103:0.100. It104:0.100.
b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'

<Figure size 432x288 with 0 Axes>

Smooth linear deformations have been extensively studied in the 90's, especially with Bookstein's Thin Plate Spline kernel:

\begin{align} \|v\|_{k}^{*2} ~~=~~ \iint_{\mathbb{R}^2} \|\partial^2_{xx} v(x)\|^2~ \text{d}x \end{align}

## B) A Riemannian model¶

Limits of the linear model. Assuming that the model $\Phi(\alpha)$ is close enough to $\beta$, we may use the matching momentum $p$ to characterize the deformation $\alpha\rightarrow\beta$. Its $k$-norm can be computed through

\begin{align} \big\|\sum_{i=1}^N p_i\,\delta_{x_i}\big\|_k ~~=~~ \sqrt{\langle p, K_{xx}p\rangle}_{\mathbb{R}^{N\times2}} \end{align}

and can be used as a "shape distance" $\text{d}_k(\alpha\rightarrow \beta)$ that penalizes tearings. Going further, we may compute the FrÃ©chet mean of a population and perform kernel PCA on the momenta that link a mean shape to the observed samples.

Path distance in a space of measures. Unfortunately though, such an approach has a major drawback: The "kernel distance" is not symmetric. As it only makes use of the kernel matrix $K_{xx}$ computed on the source point cloud, it induces a bias which may be detrimental to statistical studies.

An elegant solution to this problem is to understand the kernel cost

$$\langle p, K_{xx} p\rangle ~=~ \langle v, K_{xx}^{-1} v\rangle$$

as a Riemannian, infinitesimal metric that penalizes small deformations. We may then look for trajectories

\begin{align} \alpha(\cdot)~:~t\in[0,1] ~\mapsto~ \alpha(t)~=~\sum_{i=1}^N \alpha_i\,\delta_{x_i(t)} \end{align}

such that $\alpha(0)=\alpha$ and minimize a cost

\begin{align} \text{Cost}(\alpha(\cdot)) ~~=~~ \lambda_1\,\text{Fidelity}(\alpha(1),\beta) ~+~ \lambda_2\,\int_0^1 \underbrace{\big\langle \dot{x}(t), K_{x(t),x(t)}^{-1}\,\dot{x}(t)\big\rangle}_{\|\dot{x}(t)\|_{x(t)}^2}\,\text{d}t \end{align}

with a regularization matrix that evolves along the path.

Momentum field. In practice, inverting the smoothing matrix is an ill-posed problem. We may thus parameterize the problem through the momentum field $(p_i(t))\in\mathbb{R}^{N\times 2}$ and write

\begin{align} \text{Cost}(\alpha(\cdot)) ~~=~~ \lambda_1\,\text{Fidelity}(\alpha(1),\beta) ~+~ \lambda_2\,\int_0^1 \big\langle p(t), K_{x(t),x(t)}\,p(t)\big\rangle\,\text{d}t, \end{align}

keeping in mind that $x(0)$ is given by $\alpha$ and that $\dot{x}=v=K_{x,x}p$

Exercise 2: Implement a path registration method, sampling the interval $[0,1]$ with 10 timesteps.

In [7]:
run -i nt_solutions/measures_2/exo2

It  1:6.390. It  2:2.067. It  3:1.256. It  4:0.893. It  5:0.482.
It  6:0.326. It  7:0.258. It  8:0.232. It  9:0.175. It 10:0.156.
It 11:0.146. It 12:0.139. It 13:0.133. It 14:0.122. It 15:0.122.
It 16:0.119. It 17:0.116. It 18:0.114. It 19:0.111. It 20:0.111.
It 21:0.110. It 22:0.109. It 23:0.108. It 24:0.108. It 25:0.107.
It 26:0.106. It 27:0.105. It 28:0.104. It 29:0.103. It 30:0.103.
It 31:0.102. It 32:0.102. It 33:0.102. It 34:0.101. It 35:0.101.
It 36:0.100. It 37:0.100. It 38:0.100. It 39:0.100. It 40:0.100.
It 41:0.099. It 42:0.099. It 43:0.099. It 44:0.098. It 45:0.098.
It 46:0.098. It 47:0.097. It 48:0.097. It 49:0.097. It 50:0.097.
It 51:0.097. It 52:0.096. It 53:0.096. It 54:0.096. It 55:0.096.
It 56:0.096. It 57:0.096. It 58:0.096. It 59:0.096. It 60:0.096.
It 61:0.096. It 62:0.096. It 63:0.096. It 64:0.096. It 65:0.096.
It 66:0.095. It 67:0.095. It 68:0.095. It 69:0.095. It 70:0.095.
It 71:0.095. It 72:0.095. It 73:0.095. It 74:0.095. It 75:0.095.
It 76:0.095. It 77:0.095. It 78:0.095. It 79:0.095. It 80:0.095.
It 81:0.095. It 82:0.095. It 83:0.095. It 84:0.095. It 85:0.095.
It 86:0.095. It 87:0.095. It 88:0.095. It 89:0.095. It 90:0.095.
It 91:0.095. It 92:0.095. It 93:0.095. It 94:0.095. It 95:0.095.
It 96:0.094. It 97:0.094. It 98:0.094. It 99:0.094. It100:0.094.
It101:0.094. It102:0.094. It103:0.094. It104:0.094. It105:0.094.
It106:0.094. It107:0.094. It108:0.094. It109:0.094. It110:0.094.
It111:0.094. It112:0.094. It113:0.094. It114:0.094. It115:0.094.
It116:0.094. It117:0.094. It118:0.094. It119:0.094. It120:0.094.
It121:0.094.
b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'