Skip to content

davidboers/kmeans

Repository files navigation

k-means clustering algorithm

See wiki links:

How to use

There are two main ways of executing the algorithm. The difference is in how the original centroids are established.

I would recommend using kMeans, as shown below. The kMeans function randomizes the points to create the original centroids. Every time the function is called, a different set of centroids will be initialized.

import KMeans.Algorithm

-- kMeans :: Point a => Int -> [a] -> IO [Cluster a]

main :: IO ()
main =
 do clusters <- kMeans k points
    putStrLn $ displayClusters clusters
  where
    k = 2 -- How many clusters to create
    points = [(5, 3), (9, 2), (2, 5), (4, 1), (7, 3), (10, 4), (7, 2), (1, 2)]

But the library also contains kMeansStatic, which allows you to specify a seed for the randomized centroids. This way, you can get the same output from the algorithm every time you call it, which can be helpful for testing.

-- kMeansStatic :: Point a => Int -> Int -> [a] -> [Cluster a]

main :: IO ()
main =
    kMeansStatic seed k points >>=
        (putStrLn . displayClusters)
  where
    seed = 6 -- You can try different seeds to get specific outcomes
    k = 2
    points = -- ...

Point instances

The library provides the following instances of Point:

  • Point (Double, Double), intended for coordinates. The distance between two points is the hypotenuse, as determined according to the Pythagorean theorem.
  • Point Int, intended mostly to facilitate other instances involving numerical values.
  • Point Char, again intended to facilitate another instance, in this case Point Text.
  • Point Text, allows clustering of text. Distance is determined according to the Levenshtein distance.
  • Point [a], allows clustering of lists of anything that with an instance of Point.

If the library doesn't contain an instance of the Point class to your needs, feel free to create one.

Scaling

The library also provides a module to permit multidimensional scaling (MDS) using the SMACOF (Scaling by Majorizing a Complicated Function) method. Specifically, the user can create a set of coordinates on a 2d Cartesian plane, to visualize a set of points, even if the Point type doesn't lend itself to visual presentation.

import KMeans.Scaling

numbers :: [Int]
numbers = [ 10, 16, 20, 17 ]

main = print $ plotPoints numbers
-- [ (-1.87, -5.40)
-- , (0.84, 0.11)
-- , (0.23, 4.16)
-- , (0.80, 1.13)
-- ] 

The above coordinates are plotted below:

About

k-means clustering algorithm in Haskell.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published