Speaker: Dr. Narasimha Murty

Title of the Talk: Clustering Large Datasets

Abstract: Clustering is a well-known data mining tool. It has applications in important areas like document recognition, intrusion detection, biometrics, and web mining. It has been successfully used in efficient classification. In this talk, it is proposed to look at various approaches for efficient clustering of large datasets. Here, by a large dataset, we mean a dataset that does not fit into the main memory of a machine. So, the data is stored on a secondary storage medium and is transferred into the main memory based on need. The dataset is large because either the number of patterns is large or the number of features is large. As a consequence, the number of dataset scans affects the feasibility of clustering such large datasets. Different schemes available for clustering large datasets, based on a small number of dataset scans, will be presented in the talk.