Abstract:
Clustering categorizes a population N data point into K subgroups so that data points in one group are more similar to those in other groups. The fundamental goal of clustering is dividing data into reasonable groupings based on similarity. Clustering helps define and explore the internal structure of data. Clustering methods can be applied to detect abnormal behavior, segment customers on their buying patterns, and reduce large datasets into fewer related categories. This study used the cosine similarity with the K means clustering method to cluster a news20 dataset. The performance of a proposed system is evaluated using the homogeneity, completeness, V-measures, adjusted rand index, and silhouette coefficient metrics. The experimental findings of a proposed method show the proposed method achieved better performance for clustering of a News 20 dataset.