K-tree: Large Scale Document Clustering

01/06/2010
by   Christopher M. de Vries, et al.
0

We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using document collections. The K-tree has a low time complexity that is suitable for large document collections. This tree structure allows for efficient disk based implementations where space requirements exceed that of main memory.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset