3) On the space of landmarks

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import plotly
# run at the start of every notebook
plotly.offline.init_notebook_mode() 

# Mandatory imports...
import numpy as np
from numpy   import random
import torch
from torch import tensor
from torch.nn import Parameter

# Custom modules:
from kendall_triangles import KendallTriangles # Fancy visualization
from model import Model  # Optimization blackbox
from display import plot_worms # Simple plotting routine for polygons

# Finally, emulate complex numbers with pairs of floats:
from torch_complex import rot,normalize,herm,angle,mod2,mod,comp 

A) The dataset

At the end of the previous notebook, we've seen how to perform tangent PCA in the manifold of triangles up to similarities. Let us now generalize this analysis to higher-dimensional spaces of landmarks, or labeled point clouds, up to similarities.

Our dataset: a small video of a C. elegans worm, from the C.elegans Behavioural Database.

In [2]:
from segmentation import segment_worms
segment_worms(day=5, N=20, screenshot=True) # see 'data/'
Detected 1 worms.

Using a barebone segmentation algorithm (gradient thresholding + morphological operations provided by the scikit image library), we've extracted the worm's shape across 200 frames as point clouds $A_k$ in $\mathbb{R}^{20\times 2}$.

In [12]:
A_k = np.loadtxt("data/day_5.csv", delimiter=",")
A_k = A_k.reshape(A_k.shape[0], A_k.shape[1]//2, 2)
A_k[:,:,1] = 264 - A_k[:,:,1] # "axis image"
A_k = torch.tensor(A_k).float()
In [13]:
print(A_k.size())
torch.Size([200, 20, 2])
In [14]:
plt.figure(figsize=(10,10)) ; L = 264
plot_worms(A_k, "green");plt.axis([0,L,0,L])

plt.figure(figsize=(10,10))
plt.subplot(2,2,1);plot_worms(A_k[  0],"green");plt.axis([0,L,0,L])
plt.subplot(2,2,2);plot_worms(A_k[ 50],"green");plt.axis([0,L,0,L])
plt.subplot(2,2,3);plot_worms(A_k[150],"green");plt.axis([0,L,0,L])
plt.subplot(2,2,4);plot_worms(A_k[199],"green");plt.axis([0,L,0,L])
plt.show()

B) Kendall's formulas

If the number $N$ of points per shape exceeds three, the quotient shape space cannot be displayed as simply as the sphere of triangles. However, most of Kendall's results still hold: the space of $N$-landmarks' shapes in the plane, up to similarities, is a quotient of the sphere of dimension $2N-3$ that can be endowed with a Riemannian arc distance:

In [6]:
def arc( A, B ) :
    "'Geodesic' distance between two triangles A and B."
    a, b = normalize(A), normalize(B)
    return mod( herm(a,b) ).acos()

As is the previous notebooks, the $\text{Exp}$ and $\text{Log}$ mappings can be computed in closed form:

In [7]:
def log(a,b):
    """
    Assuming that a and b are both normalized, returns
    the Riemannian logarithm of b at point a.
    """
    # Optimally rotate b onto a
    bsa = herm(b, a)
    b   = rot(b, angle(bsa))
    
    # Compute the normal vector 
    c = b - (a*b).sum()*a
    c = c / (c**2).sum().sqrt()
    L = arc(a,b)
    return L*c
    
def exp(a,v):
    "Shoots along the tangent vector v at location a."
    L = (v**2).sum().sqrt()
    return a*torch.cos(L) + (v/L)*torch.sin(L)
    
def geodesic(a,b):
    """
    Returns the geodesic between two *normalized* polygons a and b,
    sampled on 10 timesteps with the normal tangent vector at a.
    """
    v = log(a,b)
    g_t = [ exp(a, t*v) \
            for t in torch.linspace(0,1,11) ]
    return torch.stack( g_t )

C) Your turn!

Now, please take advantage of these routines to perform a (simplistic) statstical study!

Exercise 1: Compute the Fréchet mean of the worm's shape across the video frames. What do you think about it?

In [8]:
class FrechetMean(Model) :
    "Find the Frechet mean of a population of polygons."
    def __init__(self, guess, p=2) :
        "Defines an initial guess and the Frechet exponent."
        super(Model, self).__init__()
        self.x = Parameter(guess)
        self.p = p

    def __call__(self) :
        "Returns the current guess."
        return self.x
    
    def cost(self, targets) :
        "Returns the mean p-distance to the current guess."
        return ( arc(targets, self())**self.p ).mean()
    
# Initial guess: any non-degenerate triangle should be ok
A0 = A_k[0].float()

FMean   = FrechetMean( A0, p=2 ) 
FMean.fit(   A_k )

# For a clean display, let's normalize everybody
mean   = normalize( FMean()  )
a_k    = normalize( A_k      )

plt.figure()
#plot_worms( a_k[:10] , "green")
plot_worms( mean,      "red")
plt.axis("equal")
plt.show()
It  1:0.042. It  2:0.040. It  3:0.037. It  4:0.029. It  5:0.029. 
It  6:0.029. 
b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'

Exercise 2: Perform tangent PCA in the shape space around the mean worm. How many modes do we need to represent our data with good accuracy?

In [9]:
# Compute the log-map of the A_k's around the mean
v_k = torch.stack( [ log(mean, a) for a in a_k ] )
# Turn it into an N-by-6 tensor...
v_k = v_k.view(len(v_k),-1)
v_k = v_k.detach().cpu().numpy()

# That we can feed to scikit-learn's PCA module
from sklearn.decomposition import PCA
pca = PCA()
V_k = pca.fit_transform(v_k)
print("Explained variance: ")
print(pca.explained_variance_ratio_)
Explained variance: 
[4.07093555e-01 2.44551972e-01 1.73827931e-01 1.06752954e-01
 2.20607650e-02 1.61963552e-02 1.51208695e-02 4.91368538e-03
 2.51938030e-03 1.68347021e-03 1.33187545e-03 9.86396335e-04
 6.19863218e-04 5.84625930e-04 4.85361059e-04 3.67251108e-04
 2.15434004e-04 1.35435257e-04 9.89477339e-05 8.69227224e-05
 6.28390844e-05 4.78058064e-05 4.22591183e-05 3.89567940e-05
 3.27314883e-05 2.94660385e-05 2.35977659e-05 2.14664869e-05
 1.93281940e-05 1.53300825e-05 1.15095172e-05 6.90579054e-06
 5.28795499e-06 4.07900097e-06 3.43788270e-06 1.93920437e-06
 2.44544879e-12 2.17696954e-12 2.25190638e-14 1.58601431e-14]

Exercise 3: Display the main PCA modes of variation. How would you describe them?

In [10]:
K = 4
# We only need the first four features...
pca = PCA(n_components=K)
V_k = pca.fit_transform(v_k)
sig = np.sqrt(pca.explained_variance_)

# Let's retrieve the four main directions:
modes = []
for i in range(K) :
    vec = np.zeros(K)
    vec[i] = 1.
    mode   = tensor( sig[i]*pca.inverse_transform( vec ))
    mode   = 2 * mode.float().view(-1,2) # inflate a little bit...
    modes.append( [ exp(mean, mode), exp(mean, -mode) ] )

# Display
plt.figure(figsize=(10,10))
for i, (p, m) in enumerate(modes) :
    plt.subplot(2,2,i+1)
    plot_worms(mean,"red")
    plot_worms(p,   "blue")
    plot_worms(m,   "blue")
    plt.axis([-.5,.5,-.5,.5])
plt.show()

Exercise 4: Display the population in the exponential chart centered around the mean. How would you describe the shape distribution with respect to the whole shape space?

In [11]:
plt.figure(figsize=(10,10))
plt.scatter(V_k[:,0], V_k[:,1], color="green", s = 25)
plt.scatter( [0.], [0.], color="red", s=125)
plt.scatter( [sig[0],-sig[0]], [0.,0.], color="blue", s=81)
plt.scatter( [0.,0.], [sig[1],-sig[1]], color="cyan", s=81)

midway = plt.Circle((0,0),np.pi/4,color='r', fill=False)
plt.gcf().gca().add_artist(midway)
cut_locus = plt.Circle((0,0), np.pi/2,color='r', fill=False)
plt.gcf().gca().add_artist(cut_locus)

plt.axis("equal")
plt.axis([-np.pi/2,np.pi/2,-np.pi/2,np.pi/2])
plt.title("Exponential chart at the mean location")
plt.show()

Conclusion. In today's session dedicated to the classical theory of shapes, we've detailed a baseline method to decompose the variability of an anatomical population. The algorithms above let us study separately the similarity vector (position, orientation and scale, encoded through a pair of complex numbers) and the manifold-valued shape residual.

In very simple settings (low variance, labeled data), a straightforward tangent analysis on the shape residual may be good enough. But in a medical setting, most anatomy problems are ill-posed. Designing methods that can enforce a strong topological prior will be the subject of tomorrow's session!