Somoclu is a C++ tool for training self-organizing maps on large data sets using a high-performance cluster. It builds on MPI for distributing the workload across the nodes of the cluster. It is also able to boost training by using CUDA if graphics processing units are available. A sparse kernel is included, which is useful for high-dimensional but sparse data, such as the vector spaces common in text mining workflows. The code is released under GNU GPLv3 licence.
Sponsorship:
This work was supported by the European Commission Seventh Framework Programme under Grant Agreement Number FP7-601138 PERICLES and by the AWS in Education Machine Learning Grant award