T
- type of the points to clusterpublic class FuzzyKMeansClusterer<T extends Clusterable> extends Clusterer<T>
The Fuzzy K-Means algorithm is a variation of the classical K-Means algorithm, with the major difference that a single data point is not uniquely assigned to a single cluster. Instead, each point i has a set of weights uij which indicate the degree of membership to the cluster j.
The algorithm then tries to minimize the objective function:
J = ∑i=1..C∑k=1..N uikmdik2with dik being the distance between data point i and the cluster center k.
The algorithm requires two parameters:
The fuzzy variant of the K-Means algorithm is more robust with regard to the selection of the initial cluster centers.
Modifier and Type | Field and Description |
---|---|
private java.util.List<CentroidCluster<T>> |
clusters
The list of clusters resulting from the last call to
cluster(Collection) . |
private static double |
DEFAULT_EPSILON
The default value for the convergence criteria.
|
private double |
epsilon
The convergence criteria.
|
private double |
fuzziness
The fuzziness factor.
|
private int |
k
The number of clusters.
|
private int |
maxIterations
The maximum number of iterations.
|
private double[][] |
membershipMatrix
The membership matrix.
|
private java.util.List<T> |
points
The list of points used in the last call to
cluster(Collection) . |
private RandomGenerator |
random
Random generator for choosing initial centers.
|
Constructor and Description |
---|
FuzzyKMeansClusterer(int k,
double fuzziness)
Creates a new instance of a FuzzyKMeansClusterer.
|
FuzzyKMeansClusterer(int k,
double fuzziness,
int maxIterations,
DistanceMeasure measure)
Creates a new instance of a FuzzyKMeansClusterer.
|
FuzzyKMeansClusterer(int k,
double fuzziness,
int maxIterations,
DistanceMeasure measure,
double epsilon,
RandomGenerator random)
Creates a new instance of a FuzzyKMeansClusterer.
|
Modifier and Type | Method and Description |
---|---|
private double |
calculateMaxMembershipChange(double[][] matrix)
Calculate the maximum element-by-element change of the membership matrix
for the current iteration.
|
java.util.List<CentroidCluster<T>> |
cluster(java.util.Collection<T> dataPoints)
Performs Fuzzy K-Means cluster analysis.
|
java.util.List<CentroidCluster<T>> |
getClusters()
Returns the list of clusters resulting from the last call to
cluster(Collection) . |
java.util.List<T> |
getDataPoints()
Returns an unmodifiable list of the data points used in the last
call to
cluster(Collection) . |
double |
getEpsilon()
Returns the convergence criteria used by this instance.
|
double |
getFuzziness()
Returns the fuzziness factor used by this instance.
|
int |
getK()
Return the number of clusters this instance will use.
|
int |
getMaxIterations()
Returns the maximum number of iterations this instance will use.
|
RealMatrix |
getMembershipMatrix()
Returns the
nxk membership matrix, where n is the number
of data points and k the number of clusters. |
double |
getObjectiveFunctionValue()
Get the value of the objective function.
|
RandomGenerator |
getRandomGenerator()
Returns the random generator this instance will use.
|
private void |
initializeMembershipMatrix()
Initialize the membership matrix with random values.
|
private void |
saveMembershipMatrix(double[][] matrix)
Copy the membership matrix into the provided matrix.
|
private void |
updateClusterCenters()
Update the cluster centers.
|
private void |
updateMembershipMatrix()
Updates the membership matrix and assigns the points to the cluster with
the highest membership.
|
distance, getDistanceMeasure
private static final double DEFAULT_EPSILON
private final int k
private final int maxIterations
private final double fuzziness
private final double epsilon
private final RandomGenerator random
private double[][] membershipMatrix
private java.util.List<T extends Clusterable> points
cluster(Collection)
.private java.util.List<CentroidCluster<T extends Clusterable>> clusters
cluster(Collection)
.public FuzzyKMeansClusterer(int k, double fuzziness) throws NumberIsTooSmallException
The euclidean distance will be used as default distance measure.
k
- the number of clusters to split the data intofuzziness
- the fuzziness factor, must be > 1.0NumberIsTooSmallException
- if fuzziness <= 1.0
public FuzzyKMeansClusterer(int k, double fuzziness, int maxIterations, DistanceMeasure measure) throws NumberIsTooSmallException
k
- the number of clusters to split the data intofuzziness
- the fuzziness factor, must be > 1.0maxIterations
- the maximum number of iterations to run the algorithm for.
If negative, no maximum will be used.measure
- the distance measure to useNumberIsTooSmallException
- if fuzziness <= 1.0
public FuzzyKMeansClusterer(int k, double fuzziness, int maxIterations, DistanceMeasure measure, double epsilon, RandomGenerator random) throws NumberIsTooSmallException
k
- the number of clusters to split the data intofuzziness
- the fuzziness factor, must be > 1.0maxIterations
- the maximum number of iterations to run the algorithm for.
If negative, no maximum will be used.measure
- the distance measure to useepsilon
- the convergence criteria (default is 1e-3)random
- random generator to use for choosing initial centersNumberIsTooSmallException
- if fuzziness <= 1.0
public int getK()
public double getFuzziness()
public int getMaxIterations()
public double getEpsilon()
public RandomGenerator getRandomGenerator()
public RealMatrix getMembershipMatrix()
nxk
membership matrix, where n
is the number
of data points and k
the number of clusters.
The element Ui,j represents the membership value for data point i
to cluster j
.
MathIllegalStateException
- if cluster(Collection)
has not been called beforepublic java.util.List<T> getDataPoints()
cluster(Collection)
.null
if cluster(Collection)
has
not been called before.public java.util.List<CentroidCluster<T>> getClusters()
cluster(Collection)
.null
if cluster(Collection)
has
not been called before.public double getObjectiveFunctionValue()
MathIllegalStateException
- if cluster(Collection)
has not been called beforepublic java.util.List<CentroidCluster<T>> cluster(java.util.Collection<T> dataPoints) throws MathIllegalArgumentException
cluster
in class Clusterer<T extends Clusterable>
dataPoints
- the points to clusterMathIllegalArgumentException
- if the data points are null or the number
of clusters is larger than the number of data pointsprivate void updateClusterCenters()
private void updateMembershipMatrix()
private void initializeMembershipMatrix()
private double calculateMaxMembershipChange(double[][] matrix)
matrix
- the membership matrix of the previous iterationprivate void saveMembershipMatrix(double[][] matrix)
matrix
- the place to store the membership matrixCopyright (c) 2003-2014 Apache Software Foundation