This is a fast and simple to use SOM library. It utilizes online training (one data point at the time) rather than batch training. The implemented topologies are a simple 2D lattice or a torus.
To install this package with pip run:
pip install numbasom
To install this package with conda run:
conda install -c mnikola numbasom
To import the library you can safely use:
from numbasom import *
A Self-Organizing Map is often used to show the underlying structure in data. To show how to use the library, we will train it on 200 random 3-dimensional vectors (so we can render them as colors):
import numpy as np
data = np.random.random([200,3])
We initalize a map with 50 rows and 100 columns. The default topology is a 2D lattice. We can also train it on a torus by setting is_torus=True
som = SOM(som_size=(50,100), is_torus=False)
We will adapt the lattice by iterating 10.000 times through our data points. If we set normalize=True
, data will be normalized before training.
lattice = som.train(data, num_iterations=15000)
lattice[5,3]
lattice[1::6,1]
The shape of the lattice should be (50, 100, 3)
lattice.shape
Since our lattice is made of 3-dimensional vectors, we can represent it as a lattice of colors.
import matplotlib.pyplot as plt
plt.imshow(lattice)
plt.show()
Since the most of the data will not be 3-dimensional, we can use the u_matrix
(unified distance matrix by Alfred Ultsch) to visualise the map and the clusters emerging on it.
um = u_matrix(lattice)
Each cell of the lattice is just a single value, thus the shape is:
um.shape
The library contains a function plot_u_matrix
that can help visualise it.
plot_u_matrix(um, fig_size=(6.2,6.2))
To project data on the lattice, use project_on_lattice
function.
Let's project a couple of predefined color on the trained lattice and see in which cells they will end up:
colors = np.array([[1.,0.,0.],[0.,1.,0.],[0.,0.,1.],[1.,1.,0.],[0.,1.,1.],[1.,0.,1.],[0.,0.,0.],[1.,1.,1.]])
color_labels = ['red', 'green', 'blue', 'yellow', 'cyan', 'purple','black', 'white']
projection = project_on_lattice(colors, lattice, additional_list=color_labels)
for p in projection:
if projection[p]:
print (p, projection[p][0])
To find every cell's closes vector in the provided data, use lattice_closest_vectors
function.
We can again use the colors example:
closest = lattice_closest_vectors(colors, lattice, additional_list=color_labels)
We can ask now to which value in color_labels
are out lattice cells closest to:
closest[(1,1)]
closest[(40,80)]
We can find the closest vectors without supplying an additional list. Then we get the association between the lattice and the data vectors that we can display as colors.
closest_vec = lattice_closest_vectors(colors, lattice)
We take the values of the closest_vec
vector and reshape it into a numpy vector values
.
values = np.array(list(closest_vec.values())).reshape(50,100,-1)
We can now visualise the projection of our 8 hard-coded colors onto the lattice:
plt.imshow(values)
plt.show()
We can use the function lattice_activations
:
activations = lattice_activations(colors, lattice)
Now we can show how the vector blue: [0.,0.,1.]
activates the lattice:
plt.imshow(activations[2])
plt.show()
If we wish to scale the higher values up, and scale down the lower values, we can use the argument exponent
when computing the activations:
activations = lattice_activations(colors, lattice, exponent=8)
plt.imshow(activations[2])
plt.show()