
Jump to:
Freely distributable documents and computer programs
Le Progiciel R / The R Package
CCA Bibliography 86-96
Lexique anglais-français d'écologie numérique et de statistique


|
|
K-Means least-squares partitioning method
Updated July 27th, 2001
Pierre Legendre
July 2001
Département de Sciences Biologiques
Université de Montréal
K-means is a least-squares partitioning method allowing users to divide a collection of objects into K groups. The theory is presented in all textbooks of numerical classification methods, including section 8.8 of Legendre and Legendre (1998). The program implements a simple alternating least-squares
algorithm, the same one as used in the SAS FASTCLUS procedure. The algorithm iterates between two simple steps:
- Compute cluster centroids and use them as new cluster seeds.
- Assign each object to the nearest seed.
Among the 20 or so algorithms that have been described for K-means partitioning, this one is known to be the fastest; it also has good convergence properties.
In the course of the iterations, the program tries to minimise the sum, over all groups, of the squared within-group residuals, which are the distances of the objects to the respective group centroids. Convergence is reached when the objective function (i.e., the residual sum-of-squares) cannot be lowered
any more. The groups obtained are such that they are geometrically as compact as possible around their respective centroids.
Since K-means partitioning is one of the so-called “NP-hard problems”, no algorithm can guarantee that the absolute minimum of the objective function has been reached; any given run of K-means partitioning may end up in what is called a “local minimum” of the objective function. A strategy, described in the User's guide, helps users find a value of the objective function which is likely to represent the overall minimum.
New in July 2001 is the ability to specify an initial list of centroids to start the K-Means partition with.
Program availability
- Macintosh version
- Compiled versions of the program for any Macintosh computer (fat binary)
- Program documentation, in Adobe Acrobat format
- 32-bit DOS version
(The executable file is a Win32 "console" executable, not a DOS executable. Therefore it cannot run under plain DOS, nor in a DOS window under Windows 3.x, only in Windows 95/98 or Windows NT consoles)
- Compiled version of the program for Win32 compatible computers
- Program documentation, in Adobe Acrobat format
References:
Legendre, P. & Legendre, L. 1998. Numerical Ecology, 2nd English edition. Elsevier Science BV, Amsterdam. xv + 853 pages.
Last updated on Sunday, August 01, 2010 by Philippe Casgrain
Created on Monday, February 14, 2000
|