Abstract:
Clustering is a well-known data mining tool. It has applications in
important
areas like document recognition, intrusion detection, biometrics, and
web
mining. It has been successfully used in efficient classification. In
this talk, it is proposed to look at various approaches for efficient
clustering of large datasets. Here, by a large dataset, we mean a dataset
that does not fit into the main memory of a machine. So, the data is
stored on a secondary
storage medium and is transferred into the main memory based on need.
The
dataset is large because either the number of patterns is large or the
number of features is large. As a consequence, the number of dataset
scans affects the
feasibility of clustering such large datasets. Different schemes
available for
clustering large datasets, based on a small number of dataset scans,
will be
presented in the talk.
|