Exploding Coffee » Kohonen Network

(This is one of some older posts of mine, originally written on 2012-10-17.)

Kohonen networks, or self-organizing maps (SOM), are a particular type of artificial neural networks that are trained using unsupervised learning. Depending on the input and network configuration, they can be very useful in classifying and visualizing high-dimensional data.

They are also surprisingly simple to create. A basic algorithm, and the one I used to produce the images below, is:

  1. Give all the points in the network a random position in the input space. This position is usually known as the weight vector $\bf w$.
  2. Pick a random point $\bf x$ from the input.
  3. Find the node closest to that point, i.e. find the point $i=\underset{i}{\arg\min}\,||{\bf x} - {\bf w}_i||$. We call this node the winning node.
  4. Move the nodes of the network closer to $\bf x$ based on a step length $\eta(n)$ and neighborhood function $h_{ij}(n)$ (see below): ${\bf w}_j = {\bf w}_j + \eta(n) h_{ij}(n) ({\bf x} - {\bf w}_j)$ where $n$ is the current training iteration.
  5. Repeat from step 2 for a couple of thousand iterations.

The (a possible) step length function is

$$ \eta(n)=\eta_0 e^{-n/\tau_1} $$

where $\eta_0$ is the initial step length (0.1 is a good value) and $\tau_1$ is a constant controlling how quickly the step length decays. This should be about as large as the number of training iterations, otherwise the step length may become so small that no training is actually accomplished.

The neighborhood function is a very important function for Kohonen network as it defines how neighboring nodes are affected when the winner is updated. The (or, again, a possible) neighborhood function is

$$ h_{ij}(n)=e^{-\frac{d_{ij}^2}{2\sigma(n)}} $$
$$ \sigma(n)=\sigma_0 e^{-n/\tau_2} $$

where $d$ is the so-called lateral distance between nodes $i$ and $j$. This distance is not the distance between the weight vectors ${\bf w}_i$ and ${\bf w}_j$, but the distance between the indices. For a one-dimensional network this is simply $d_{ij}=|i-j|$.

The behavior created by the decaying step length and neighborhood size (also known as the radius of attention) is rather fractal in nature. In the beginning the step length is large and the neighborhood should cover most of the network, meaning the first few updates pulls the entire network towards the input and starts ordering the network from its initial random state.

As the step length and neighborhood shrinks, smaller and smaller features of the network are refined and the network spreads to cover the input domain.

Here is a one-dimensional Kohonen network of 1000 nodes (the colored line) trained on data sampled uniformly from a circle (shown in black):

Here is another network with the same parameters, but trained on data sampled from a different shape (again, shown in black):

Note how in both cases the network spreads out evenly (more or less) to fill the data space.

And here is the MATLAB code used to train those two networks:

function net = trainNet (sample_function)

% Setup.
net_size = 1000;
net = rand (net_size, 2) - 0.5;

sigma_0 = 100;
tau_1 = 2000;
sigma = @(n) sigma_0 * exp (-n/tau_1);

eta_0 = 0.1;
tau_2 = 100000;
eta = @(n) eta_0 * exp (-n/tau_2);

training_length = 100000;

% Train.
for n = 1:training_length
    point = sample_function ();
    diff = [ point(1)-net(:,1) point(2)-net(:,2) ];

    score = sum (diff.^2, 2);
    [~, winner] = min (score);

    h = exp ( - ((1:net_size)' - winner).^2 / (2 * sigma(n)^2) );
    net = net + eta(n) * (h*[1 1]) .* diff;