See wiki links:
- k-Means generally: https://en.wikipedia.org/wiki/K-means_clustering#Software_implementations
- Elbow method: https://en.wikipedia.org/wiki/Elbow_method_(clustering)
- Silhouette score: https://en.wikipedia.org/wiki/Silhouette_(clustering)
There are two main ways of executing the algorithm. The difference is in how the original centroids are established.
I would recommend using kMeans
, as shown below. The kMeans function randomizes the points to create the original centroids. Every time the function is called, a different set of centroids will be initialized.
import KMeans.Algorithm
-- kMeans :: Point a => Int -> [a] -> IO [Cluster a]
main :: IO ()
main =
do clusters <- kMeans k points
putStrLn $ displayClusters clusters
where
k = 2 -- How many clusters to create
points = [(5, 3), (9, 2), (2, 5), (4, 1), (7, 3), (10, 4), (7, 2), (1, 2)]
But the library also contains kMeansStatic
, which allows you to specify a seed for the randomized centroids. This way, you can get the same output from the algorithm every time you call it, which can be helpful for testing.
-- kMeansStatic :: Point a => Int -> Int -> [a] -> [Cluster a]
main :: IO ()
main =
kMeansStatic seed k points >>=
(putStrLn . displayClusters)
where
seed = 6 -- You can try different seeds to get specific outcomes
k = 2
points = -- ...
The library provides the following instances of Point
:
Point (Double, Double)
, intended for coordinates. The distance between two points is the hypotenuse, as determined according to the Pythagorean theorem.Point Int
, intended mostly to facilitate other instances involving numerical values.Point Char
, again intended to facilitate another instance, in this casePoint Text
.Point Text
, allows clustering of text. Distance is determined according to the Levenshtein distance.Point [a]
, allows clustering of lists of anything that with an instance ofPoint
.
If the library doesn't contain an instance of the Point
class to your needs, feel free to create one.
The library also provides a module to permit multidimensional scaling (MDS) using the SMACOF (Scaling by Majorizing a Complicated Function) method. Specifically, the user can create a set of coordinates on a 2d Cartesian plane, to visualize a set of points, even if the Point
type doesn't lend itself to visual presentation.
import KMeans.Scaling
numbers :: [Int]
numbers = [ 10, 16, 20, 17 ]
main = print $ plotPoints numbers
-- [ (-1.87, -5.40)
-- , (0.84, 0.11)
-- , (0.23, 4.16)
-- , (0.80, 1.13)
-- ]
The above coordinates are plotted below: