TY - JOUR
T1 - Selecting Appropriate Clustering Methods for Materials Science Applications of Machine Learning
AU - Parker, Amanda J.
AU - Barnard, Amanda S.
N1 - Publisher Copyright:
© 2019 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
PY - 2019/12/1
Y1 - 2019/12/1
N2 - Based on a general definition of a cluster and the quality of a clustering result, a new method for evaluating existing clustering algorithms, or undertaking clustering, capable of predicting the number and type of clusters and outliers present in a data set, regardless of the complexity of the distribution of points, is presented. This algorithm, referred to as iterative label spreading, can recognize the characteristics expected of a successful clustering result before any clustering algorithm is applied, providing a type of hyper-parameter optimization for clustering. The efficacy of the algorithm, and the assessment of clustering result, are both confirmed using large benchmark two dimensional synthetic data sets, and small multidimensional data describing a set of silver nanoparticles. It is shown that the method is ideal for studying noisy data with high dimensionality and high variance, typical of data captured in materials and nanoscience.
AB - Based on a general definition of a cluster and the quality of a clustering result, a new method for evaluating existing clustering algorithms, or undertaking clustering, capable of predicting the number and type of clusters and outliers present in a data set, regardless of the complexity of the distribution of points, is presented. This algorithm, referred to as iterative label spreading, can recognize the characteristics expected of a successful clustering result before any clustering algorithm is applied, providing a type of hyper-parameter optimization for clustering. The efficacy of the algorithm, and the assessment of clustering result, are both confirmed using large benchmark two dimensional synthetic data sets, and small multidimensional data describing a set of silver nanoparticles. It is shown that the method is ideal for studying noisy data with high dimensionality and high variance, typical of data captured in materials and nanoscience.
KW - machine learning
KW - materials classification
KW - materials clustering
KW - materials design
KW - nanoparticles
UR - http://www.scopus.com/inward/record.url?scp=85076593051&partnerID=8YFLogxK
U2 - 10.1002/adts.201900145
DO - 10.1002/adts.201900145
M3 - Article
SN - 2513-0390
VL - 2
JO - Advanced Theory and Simulations
JF - Advanced Theory and Simulations
IS - 12
M1 - 1900145
ER -