k-means clustering algorithm

See wiki links:

k-Means generally: https://en.wikipedia.org/wiki/K-means_clustering#Software_implementations
Elbow method: https://en.wikipedia.org/wiki/Elbow_method_(clustering)
Silhouette score: https://en.wikipedia.org/wiki/Silhouette_(clustering)

How to use

There are two main ways of executing the algorithm. The difference is in how the original centroids are established.

I would recommend using kMeans, as shown below. The kMeans function randomizes the points to create the original centroids. Every time the function is called, a different set of centroids will be initialized.

import KMeans.Algorithm

-- kMeans :: Point a => Int -> [a] -> IO [Cluster a]

main :: IO ()
main =
 do clusters <- kMeans k points
    putStrLn $ displayClusters clusters
  where
    k = 2 -- How many clusters to create
    points = [(5, 3), (9, 2), (2, 5), (4, 1), (7, 3), (10, 4), (7, 2), (1, 2)]

But the library also contains kMeansStatic, which allows you to specify a seed for the randomized centroids. This way, you can get the same output from the algorithm every time you call it, which can be helpful for testing.

-- kMeansStatic :: Point a => Int -> Int -> [a] -> [Cluster a]

main :: IO ()
main =
    kMeansStatic seed k points >>=
        (putStrLn . displayClusters)
  where
    seed = 6 -- You can try different seeds to get specific outcomes
    k = 2
    points = -- ...

`Point` instances

The library provides the following instances of Point:

Point (Double, Double), intended for coordinates. The distance between two points is the hypotenuse, as determined according to the Pythagorean theorem.
Point Int, intended mostly to facilitate other instances involving numerical values.
Point Char, again intended to facilitate another instance, in this case Point Text.
Point Text, allows clustering of text. Distance is determined according to the Levenshtein distance.
Point [a], allows clustering of lists of anything that with an instance of Point.

If the library doesn't contain an instance of the Point class to your needs, feel free to create one.

Scaling

The library also provides a module to permit multidimensional scaling (MDS) using the SMACOF (Scaling by Majorizing a Complicated Function) method. Specifically, the user can create a set of coordinates on a 2d Cartesian plane, to visualize a set of points, even if the Point type doesn't lend itself to visual presentation.

import KMeans.Scaling

numbers :: [Int]
numbers = [ 10, 16, 20, 17 ]

main = print $ plotPoints numbers
-- [ (-1.87, -5.40)
-- , (0.84, 0.11)
-- , (0.23, 4.16)
-- , (0.80, 1.13)
-- ]

The above coordinates are plotted below:

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
src		src
test		test
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
example_scale.png		example_scale.png
fourmolu.yaml		fourmolu.yaml
kmeans-clustering.cabal		kmeans-clustering.cabal
package.txt		package.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

k-means clustering algorithm

How to use

`Point` instances

Scaling

About

Releases

Packages

Languages

License

davidboers/kmeans

Folders and files

Latest commit

History

Repository files navigation

k-means clustering algorithm

How to use

Point instances

Scaling

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`Point` instances

Packages