TY - JOUR
T1 - Structure-Free Mendeleev Encodings of Material Compounds for Machine Learning
AU - Zhuang, Zixin
AU - Barnard, Amanda S.
N1 - Publisher Copyright:
© 2023 American Chemical Society
PY - 2023
Y1 - 2023
N2 - Machine learning is a powerful tool to predict the properties of materials for a variety of applications. However, generating data sets of carefully characterized materials can be time-consuming and costly, particularly when numerous candidate materials are later found to be irrelevant. The problem could be alleviated if machine learning can be used with minimal information to provide guidance at an early stage before significant investment has been made. Since structural characterization is one of the most expensive parts of the process, this study explores structure-free encoding of materials using Mendeleev encoding, a method that does not require information such as lattice constants, lattice positions, or bonding networks. We evaluate Mendeleev encoding using three data sets of continuous, complex material compounds used for battery applications, with four different unsupervised learning methods, inclusive of six algorithms and four evaluation metrics and in addition visualizations of the results. Our results show that Mendeleev encoding is more accurate, stable, and reliable than alternative structure-free encoding, allowing both principle component analysis and archetypal analysis to capture more of the variance during dimensionality reduction and consistently provide superior clustering results. Mendeleev encoding is a simple and scientifically intuitive way of representing material data that is both human and machine-readable and is applicable to any machine-learning task training with tabular data.
AB - Machine learning is a powerful tool to predict the properties of materials for a variety of applications. However, generating data sets of carefully characterized materials can be time-consuming and costly, particularly when numerous candidate materials are later found to be irrelevant. The problem could be alleviated if machine learning can be used with minimal information to provide guidance at an early stage before significant investment has been made. Since structural characterization is one of the most expensive parts of the process, this study explores structure-free encoding of materials using Mendeleev encoding, a method that does not require information such as lattice constants, lattice positions, or bonding networks. We evaluate Mendeleev encoding using three data sets of continuous, complex material compounds used for battery applications, with four different unsupervised learning methods, inclusive of six algorithms and four evaluation metrics and in addition visualizations of the results. Our results show that Mendeleev encoding is more accurate, stable, and reliable than alternative structure-free encoding, allowing both principle component analysis and archetypal analysis to capture more of the variance during dimensionality reduction and consistently provide superior clustering results. Mendeleev encoding is a simple and scientifically intuitive way of representing material data that is both human and machine-readable and is applicable to any machine-learning task training with tabular data.
UR - http://www.scopus.com/inward/record.url?scp=85176134306&partnerID=8YFLogxK
U2 - 10.1021/acs.chemmater.3c02134
DO - 10.1021/acs.chemmater.3c02134
M3 - Article
SN - 0897-4756
VL - 35
SP - 9325
EP - 9338
JO - Chemistry of Materials
JF - Chemistry of Materials
IS - 21
ER -