Selecting Appropriate Clustering Methods for Materials Science Applications of Machine Learning

Amanda J. Parker, Amanda S. Barnard*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

44 Citations (Scopus)

Abstract

Based on a general definition of a cluster and the quality of a clustering result, a new method for evaluating existing clustering algorithms, or undertaking clustering, capable of predicting the number and type of clusters and outliers present in a data set, regardless of the complexity of the distribution of points, is presented. This algorithm, referred to as iterative label spreading, can recognize the characteristics expected of a successful clustering result before any clustering algorithm is applied, providing a type of hyper-parameter optimization for clustering. The efficacy of the algorithm, and the assessment of clustering result, are both confirmed using large benchmark two dimensional synthetic data sets, and small multidimensional data describing a set of silver nanoparticles. It is shown that the method is ideal for studying noisy data with high dimensionality and high variance, typical of data captured in materials and nanoscience.

Original languageEnglish
Article number1900145
JournalAdvanced Theory and Simulations
Volume2
Issue number12
DOIs
Publication statusPublished - 1 Dec 2019
Externally publishedYes

Fingerprint

Dive into the research topics of 'Selecting Appropriate Clustering Methods for Materials Science Applications of Machine Learning'. Together they form a unique fingerprint.

Cite this