%matplotlib inline
import matplotlib.pyplot as plt
# Mandatory imports...
import numpy as np
import torch
from torch import tensor
from torch.nn import Parameter, ParameterList
from torch.autograd import grad
# Custom modules:
from model import Model # Optimization blackbox
from display import plot, train_and_display
from torch_complex import rot # Rotations in 2D
We've seen how to work with unlabeled segmentation masks encoded as measures $\alpha$ and $\beta$: the key idea here is to use a data fidelity term which is well-defined up to resampling.
def scal( α, f ) :
"Scalar product between two vectors."
return torch.dot( α.view(-1), f.view(-1) )
def sqdistances(x_i, y_j) :
"Matrix of squared distances, C_ij = |x_i-y_j|^2."
return ( (x_i.unsqueeze(1) - y_j.unsqueeze(0)) ** 2).sum(2)
def distances(x_i, y_j) :
"Matrix of distances, C_ij = |x_i-y_j|^2."
return (x_i.unsqueeze(1) - y_j.unsqueeze(0)).norm(p=2,dim=2)
def fidelity(α, β) :
"Energy Distance between two sampled probability measures."
α_i, x_i = α
β_j, y_j = β
K_xx = -distances(x_i, x_i)
K_xy = -distances(x_i, y_j)
K_yy = -distances(y_j, y_j)
cost = .5*scal( α_i, K_xx @ α_i ) \
- scal( α_i, K_xy @ β_j ) \
+ .5*scal( β_j, K_yy @ β_j )
return cost
Most often, we work with measures that have been rigidly aligned with each other:
class IsometricRegistration(Model) :
"Find the optimal translation and rotation."
def __init__(self, α) :
"Defines the parameters of a rigid deformation of α."
super(Model, self).__init__()
self.α, self.x = α[0].detach(), α[1].detach()
self.θ = Parameter(tensor( 0. )) # Angle
self.Ï„ = Parameter(tensor( [0.,0.] )) # Position
def __call__(self, t=1.) :
# At the moment, PyTorch does not support complex numbers...
x_t = rot(self.x, t*self.θ) + t*self.τ
return self.α, x_t
def cost(self, target) :
"Returns a cost to optimize."
return fidelity( self(), target)
from sampling import load_measure
α_0 = load_measure("data/hippo_a.png")
β = load_measure("data/hippo_b.png")
print("Number of atoms : {} and {}.".format(len(α_0[0]), len(β[0])))
# In practice, we often quotient out affine deformations:
isom = IsometricRegistration(α_0)
isom.fit(β)
α = isom()
α = α[0].detach().requires_grad_(), α[1].detach().requires_grad_()
# Let's display everyone:
plt.figure(figsize=(10,10))
plot(β, "blue", alpha=.7)
plot(α_0, "purple", alpha=.2)
plot(α, "red", alpha=.7)
plt.axis("equal")
plt.axis([0,1,0,1])
plt.show()
Let us now present some weakly parametrized models that let us study the variability between $\alpha$ and $\beta$ in a continuous way. In the previous notebook, we've seen that the "free particles" setting could be implemented as follows - with control variables $(p_i)\in\mathbb{R}^{N\times 2}$:
class L2Registration(Model) :
"Find the optimal mass positions."
def __init__(self, α) :
"Defines the parameters of a free deformation of α."
super(Model, self).__init__()
self.α, self.x = α[0].detach(), α[1].detach()
self.p = Parameter(torch.zeros_like(self.x))
def __call__(self, t=1.) :
"Applies the model on the source point cloud."
x_t = self.x + t*self.p
return self.α, x_t
def cost(self, target) :
"Returns a cost to optimize."
return fidelity( self(), target)
l2_reg = train_and_display(L2Registration, α, β)
Is this a desirable result?
Smoothing. Crucially, the "free particles" setting does not enforce any kind of smoothness on the deformation, and can induce tearings in the registration. Using a kernel norm as a fidelity, we thus observe left-out particles that lag behind because of the fidelity's vanishing gradients. Going further, we may alleviate this problem with Wasserstein-like fidelities... But even then, we may then observe mass splittings instead of, say, rotations when trying to register two star-shaped measures - try it with an "X" and a "+"!
To alleviate this problem, a simple idea is to smooth the displacement field. That is, to use a vector field
$$ \begin{align} v(x)~=~\sum_{i=1}^N k(x-x_i)\,p_i ~=~ \big(k\star \sum_{i=1}^N p_i\,\delta_{x_i}\big)(x) \end{align} $$
to move around our Dirac masses, with $k$ a blurring kernel function - e.g. a Gaussian.
Regularization. Keep in mind that if the Fourier transform of $k$ is positive, any sampled vector field $v$ may be generated through such linear combinations. To enforce some kind of prior, we thus have to introduce an additional regularization term to our cost function.
From a theoretical point of view, using a positive kernel, the best penalty term is the dual kernel norm
$$ \begin{align} \big\|\sum_{i=1}^N p_i\,\delta_{x_i}\big\|_k^2 ~=~ \big\langle \sum_{i=1}^N p_i\,\delta_{x_i}, k\star \sum_{i=1}^N p_i\,\delta_{x_i} \big\rangle ~=~\sum_{i,j=1}^N \langle p_i,p_j\rangle\,k(x_i-x_j) \end{align} $$
which can be rewritten as a Sobolev-like penalty on the vector field $v$: $$ \begin{align} \big\|\sum_{i=1}^N p_i\,\delta_{x_i}\big\|_k^2 ~=~\iint_{\omega\in\mathbb{R}^2} \frac{|\widehat{v}(\omega)|^2}{\widehat{k}(\omega)}\,\text{d}\omega ~=~ \|v\|_{k}^{*2}. \end{align} $$
As we strive to optimize a cost that reads
$$ \begin{align} \text{Cost}(p) ~~=~~ \lambda_1\,\text{Fidelity}\big(\sum \alpha_i \delta_{x_i+v_i}, \beta\big) ~+~ \lambda_2\,\big\|\sum p_i\,\delta_{x_i}\big\|_k^2, \end{align} $$
the algorithm will find a compromise between accuracy and stretching.
Kernel dot product. In practice, we use a sampled interpolator
$$ \begin{align} \Phi^k_p~:~ x=(x_i)\in\mathbb{R}^{N\times2} ~\mapsto~ x~+~v~=~ x~+~ K_{xx} p, \end{align} $$
where $K$ is the kernel matrix of the $x_i$'s and $p$ is the momentum field encoded as an N-by-2 tensor.
Exercise 1: Implement a (linear) kernel registration module.
run -i nt_solutions/measures_2/exo1
Smooth linear deformations have been extensively studied in the 90's, especially with Bookstein's Thin Plate Spline kernel:
$$ \begin{align} \|v\|_{k}^{*2} ~~=~~ \iint_{\mathbb{R}^2} \|\partial^2_{xx} v(x)\|^2~ \text{d}x \end{align} $$
which allows us to retrieve affine deformations for free.
Limits of the linear model. Assuming that the model $\Phi(\alpha)$ is close enough to $\beta$, we may use the matching momentum $p$ to characterize the deformation $\alpha\rightarrow\beta$. Its $k$-norm can be computed through
$$ \begin{align} \big\|\sum_{i=1}^N p_i\,\delta_{x_i}\big\|_k ~~=~~ \sqrt{\langle p, K_{xx}p\rangle}_{\mathbb{R}^{N\times2}} \end{align} $$
and can be used as a "shape distance" $\text{d}_k(\alpha\rightarrow \beta)$ that penalizes tearings. Going further, we may compute the Fréchet mean of a population and perform kernel PCA on the momenta that link a mean shape to the observed samples.
Path distance in a space of measures. Unfortunately though, such an approach has a major drawback: The "kernel distance" is not symmetric. As it only makes use of the kernel matrix $K_{xx}$ computed on the source point cloud, it induces a bias which may be detrimental to statistical studies.
An elegant solution to this problem is to understand the kernel cost
$$ \langle p, K_{xx} p\rangle ~=~ \langle v, K_{xx}^{-1} v\rangle $$
as a Riemannian, infinitesimal metric that penalizes small deformations. We may then look for trajectories
$$ \begin{align} \alpha(\cdot)~:~t\in[0,1] ~\mapsto~ \alpha(t)~=~\sum_{i=1}^N \alpha_i\,\delta_{x_i(t)} \end{align} $$
such that $\alpha(0)=\alpha$ and minimize a cost
$$ \begin{align} \text{Cost}(\alpha(\cdot)) ~~=~~ \lambda_1\,\text{Fidelity}(\alpha(1),\beta) ~+~ \lambda_2\,\int_0^1 \underbrace{\big\langle \dot{x}(t), K_{x(t),x(t)}^{-1}\,\dot{x}(t)\big\rangle}_{\|\dot{x}(t)\|_{x(t)}^2}\,\text{d}t \end{align} $$
with a regularization matrix that evolves along the path.
Momentum field. In practice, inverting the smoothing matrix is an ill-posed problem. We may thus parameterize the problem through the momentum field $(p_i(t))\in\mathbb{R}^{N\times 2}$ and write
$$ \begin{align} \text{Cost}(\alpha(\cdot)) ~~=~~ \lambda_1\,\text{Fidelity}(\alpha(1),\beta) ~+~ \lambda_2\,\int_0^1 \big\langle p(t), K_{x(t),x(t)}\,p(t)\big\rangle\,\text{d}t, \end{align} $$
keeping in mind that $x(0)$ is given by $\alpha$ and that $\dot{x}=v=K_{x,x}p$
Exercise 2: Implement a path registration method, sampling the interval $[0,1]$ with 10 timesteps.
run -i nt_solutions/measures_2/exo2