(Author: Jean Feydy - July 2022)

**Un-ordered point clouds** are everywhere in applied mathematics. We use them to describe:

- 2D or 3D shapes in computer vision, graphics and mechanical engineering.
- probability distributions in statistics and machine learning.
- distributions of mass in (astro)physics.

In order to measure reconstruction errors or perform statistics, researchers have defined many distance functions between point clouds in dimension $D$. We often call them "loss functions" in machine learning. You may already be familiar with:

- The Closest Point and Hausdorff distances, induced by
**nearest neighbor**projections. - The kernel, Sobolev and MMD distances, induced by
**convolutions**or N-body computations. - The Earth Mover's and Wasserstein distances, induced by
**optimal transport**.

There are good reasons to study these formulas.
But beware: even **if these quantities all satisfy the axioms of a distance (symmetry, positivity, triangle inequality...), they are not at all equivalent to each other**.
This notebook illustrates this counter-intuitive fact on a toy example in dimension $D = 2$.

If you would like to dive deeper in this topic, a good start may be to read Chapter 3 of my PhD thesis - Geometric data analysis, beyond convolutions. It contains detailed answers to the questions that are discussed in this tutorial.

You may also be interested by the GeomLoss and KeOps libraries for fast and scalable implementations of the quantities that we are going to define, both on CPU and GPU.

First, we import the standard Python library for array computations. These are fairly similar to e.g. Matlab and Julia.

In [1]:

```
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
```

In [2]:

```
import torch
# Shall we use the GPU?
use_cuda = torch.cuda.is_available()
device = torch.device("cuda") if use_cuda else torch.device("cpu")
# PyTorch CPU/GPU Tensor to NumPy array conversion:
numpy = lambda x: x.detach().cpu().numpy()
```

To keep things simple and assess visually the performance of our methods, we work with point clouds $X$ and $Y$ that we sample on the unit square:

\begin{align} X &~=~ (x_1, \dots, x_N)~\in~\mathbb{R}^{N\times 2}, & Y &~=~ (y_1, \dots, y_N)~\in~\mathbb{R}^{N\times 2}. \end{align}In [3]:

```
from sampling import draw_samples
# Let's keep things fast by default. Feel free to increase!
N = 1000 # if use_cuda else 200
# X and Y are sampled from two densities, encoded as png images:
x_i = draw_samples("data/density_a.png", N=N, device=device)
y_j = draw_samples("data/density_b.png", N=N, device=device)
```

In [4]:

```
# Arbitrary color scheme for the x_i's:
colors_i = (10 * x_i[:, 0]).cos() * (10 * x_i[:, 1]).cos()
colors_i = numpy(colors_i)
# Just a plain blue color for the y_j's:
colors_j = [[0.55, 0.55, 0.95]]
```

Here is what our points cloud look like:

In [5]:

```
from sampling import draw_figure
draw_figure(
x_i=x_i,
colors_i=colors_i,
y_j=y_j,
colors_j=colors_j,
)
```

In order to highlight the properties of a "Loss" or "reconstruction error" between the point clouds $X$ and $Y$, we will track the behaviour of the points $x_i$ as we minimize the function:

\begin{align} \text{Loss}\big(x_1, \dots, x_N~;~ y_1, \dots, y_N\big) \end{align}by gradient descent with respect the the moving points $x_i$. We use a fixed step size ("learning rate"):

\begin{align} \Delta t ~=~ \text{0.05} \end{align}and compute the sequence of iterates:

\begin{align} x_i(t=0) ~&=~ x_i~,& x_i(t+\Delta t) ~=~ x_i(t)-\Delta t\cdot N \cdot \nabla_{x_i} \text{Loss}\big(x_1(t), \dots, x_N(t)~;~ y_1(t), \dots, y_N(t)\big)~. \end{align}In [6]:

```
def gradient_flow(x_i, y_j, loss, dt=0.05):
"""
Flows along the gradient of the cost function using a simple Euler scheme.
We rely on the PyTorch automatic differentiation engine to compute the
gradient of the loss function.
Parameters
----------
x_i : (N,2) torch tensor
samples of the moving source measure
y_j : (N,2) torch tensor
samples of the source target measure
cost : (α_i,x_i,β_j,y_j) -> torch float number,
real-valued function
dt : float, default = .05
time step, i.e. learning rate
"""
# Parameters for the gradient descent:
Nsteps = int(5 / dt) + 1
display_its = [int(t / dt) for t in [0, 0.25, 0.50, 1.0, 2.0, 5.0]]
# Make sure that we won't modify the input measures
x_i, y_j = x_i.clone(), y_j.clone()
# We're going to perform gradient descent on loss(x_i; y_j)
# wrt. the positions x_i of the points that make up the source:
x_i.requires_grad = True
plt.figure(figsize=(12, 8))
k = 1
for i in range(Nsteps): # Euler scheme ===============
# Compute cost and gradient
l_xy = loss(x_i, y_j)
[g] = torch.autograd.grad(l_xy, [x_i])
if i in display_its: # display
ax = plt.subplot(2, 3, k)
ax.set_title(f"t = {dt*i:1.2f}")
k = k + 1
draw_figure(
x_i=x_i,
colors_i=colors_i,
grad_i=g * N,
y_j=y_j,
colors_j=colors_j,
ax=ax,
)
# in-place modification of the tensor's values
x_i.data -= dt * N * g
```

We understand this evolution as an ideal, *model-free* registration or learning problem where a source distribution $X(t)$ is iteratively fitted towards a target (empirical) distribution $Y$.

The main purpose of this tutorial is to illustrate - and discuss! - what happens to the trajectories $x_i(t)$ when we choose a loss function over another.

As a first example, let us consider a squared Euclidean norm between our two point clouds - whose ordering has been defined at random:

\begin{align} \text{Loss}\big(x_1, \dots, x_N~;~ y_1, \dots, y_N\big) ~=~ \frac{1}{2N} \sum_{i=1}^N \|x_i - y_i\|^2~. \end{align}This corresponds to a "Hookian energy" where a small spring has been attached between the $N$ pairs of points $(x_i,y_i)$.

In [7]:

```
def Euclidean_loss(x_i, y_j):
"""
Simplistic squared Euclidean distance between sampled point clouds,
assuming a pairwise correspondence between x_i[k] and y_j[k].
"""
return 0.5 * ((x_i - y_j) ** 2).sum(1).mean()
gradient_flow(x_i, y_j, Euclidean_loss)
```

**Question 1:** What do you think about this trajectory? Is it well-defined? What do you think about the applicability of this first method?

In order to define trajectories that do not rely on a random coupling between the points $x_i$ and $y_j$, a first idea is to associate each point $x_i$ to its closest neighbor among the $y_j$'s. This corresponds to using the reconstruction error:

\begin{align} \text{Loss}\big(x_1, \dots, x_N~;~ y_1, \dots, y_N\big) ~=~ \frac{1}{2N} \sum_{i=1}^N \min_{j=1}^N \|x_i - y_j\|^2~. \end{align}Unlike the Euclidean norm that we discussed earlier, this function is well-defined between un-ordered point clouds. It is closely related to the Hausdorff distance and is a core component of the Iterative Closest Point method for rigid pose estimation. Let's see how it behaves when we apply no regularization to the movement of the points $x_i$:

In [8]:

```
def squared_distances(x_i, y_j):
"""Returns the (N,N) matrix of squared distances between the points x_i and y_j.
Note that this is a *very* naive Python code, written for pedagogical purposes.
Please see www.kernel-operations.io for a scalable and (much) faster implementation.
"""
diff_xy = x_i.view(N, 1, 2) - y_j.view(1, N, 2) # (N,N,D)
return (diff_xy**2).sum(dim=2) # (N,N)
def ICP_loss(x_i, y_j):
D_xy = squared_distances(x_i, y_j)
dists_i = D_xy.min(dim=1).values
return 0.5 * dists_i.mean()
gradient_flow(x_i, y_j, ICP_loss)
```

**Question 2:** What do you think about this trajectory?

We can also try to symmetrize the function above and work with:

In [9]:

```
def ICP_loss_symmetric(x_i, y_j):
D_xy = squared_distances(x_i, y_j) # (N,N)
dists_i = D_xy.min(dim=1).values # (N,)
dists_j = D_xy.min(dim=0).values # (N,)
return 0.25 * (dists_i.mean() + dists_j.mean())
gradient_flow(x_i, y_j, ICP_loss_symmetric)
```