class Ai4r::Clusterers::CentroidLinkage

Implementation of an Agglomerative Hierarchical clusterer with centroid linkage algorithm, aka unweighted pair group method centroid (UPGMC) (Everitt et al., 2001 ; Jain and Dubes, 1988 ; Sokal and Michener, 1958 ) Hierarchical clusterer create one cluster per element, and then progressively merge clusters, until the required number of clusters is reached. The distance between clusters is the squared euclidean distance between their centroids.

D(cx, (ci U cj)) = | mx - mij |^2
D(cx, (ci U cj)) =  (ni/(ni+nj))*D(cx, ci) + 
                    (nj/(ni+nj))*D(cx, cj) - 
                    (ni*nj/(ni+nj)^2)*D(ci, cj)

Public Instance Methods

build(data_set, number_of_clusters) click to toggle source

Build a new clusterer, using data examples found in data_set. Items will be clustered in “number_of_clusters” different clusters.

Calls superclass method
# File lib/ai4r/clusterers/centroid_linkage.rb, line 41
def build(data_set, number_of_clusters)
  super
end
eval(data_item) click to toggle source

This algorithms does not allow classification of new data items once it has been built. Rebuild the cluster including you data element.

# File lib/ai4r/clusterers/centroid_linkage.rb, line 47
def eval(data_item)
  Raise "Eval of new data is not supported by this algorithm."
end

Protected Instance Methods

linkage_distance(cx, ci, cj) click to toggle source

return distance between cluster cx and cluster (ci U cj), using centroid linkage

# File lib/ai4r/clusterers/centroid_linkage.rb, line 55
def linkage_distance(cx, ci, cj)
  ni = @index_clusters[ci].length
  nj = @index_clusters[cj].length
  ( ni * read_distance_matrix(cx, ci) +
    nj * read_distance_matrix(cx, cj) -
   1.0 * ni * nj * read_distance_matrix(ci, cj) / (ni+nj)) / (ni+nj)
end