Clustering Algorithms: An Investigation of K-mean and DBSCAN on Different Datasets

Authors

  • Arooj Zahra Department of Computer Science, Bahauddin Zakariya University, Multan, 60000 ,Pakistan
  • Nabeel Asghar Department of Computer Science, Bahauddin Zakariya University, Multan, 60000, Pakistan

Keywords:

Unsupervised machine learning; Clustering algorithms; DB-SCAN; K-Means; Classifiers;

Abstract

The branch of artificial intelligence that studies computer techniques that allow systems to learn autonomously and deliver outcomes based on past experience without being programmed. Supervised and unsupervised machine learning are major categories. Our research focuses on unsupervised learning with unlabeled data. Clustering is an unsupervised learning method that groups unlabeled data items by similarity. Several studies have compared clustering algorithms based on complexity, performance, and the impact of cluster number on performance. To our knowledge, no study has evaluated clustering methods on small and large datasets. A detailed study was conducted to evaluate DB-SCAN and K-Means algorithms on small and large datasets. We have collected 17 open access, publicly available machine learning heterogeneous datasets from online machine learning dataset sources such as the UCI repository, Keel, and Kaggle. The datasets are divided into small and large categories based on the number of instances in each dataset. Different preprocessing techniques are used to improve the quality of datasets. The class field is removed from the preprocessed datasets and then put into the two clustering techniques outlined above. The clustered data is analyzed using three classifiers (K-Nearest Neighbor, Support Vector Machine, and Naïve Bayes) to evaluate the clustering algorithm's performance. The accuracy of the KNN, SVM, and NB classifiers was calculated as part of the final algorithm performance study. The final analysis of tests found that the K-Means algorithm performs better on large datasets, whereas the DB-SCAN clustering technique is more efficient on small datasets.

Downloads

Download data is not yet available.

Additional Files

Published

2023-08-16

How to Cite

Zahra, A., & Asghar, N. (2023). Clustering Algorithms: An Investigation of K-mean and DBSCAN on Different Datasets. Machines and Algorithms, 2(2), 137–164. Retrieved from https://knovell.org/MnA/index.php/ojs/article/view/47

Issue

Section

Articles

Categories