Structure-Free Mendeleev Encodings of Material Compounds for Machine Learning

Zixin Zhuang, Amanda S. Barnard*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)

Abstract

Machine learning is a powerful tool to predict the properties of materials for a variety of applications. However, generating data sets of carefully characterized materials can be time-consuming and costly, particularly when numerous candidate materials are later found to be irrelevant. The problem could be alleviated if machine learning can be used with minimal information to provide guidance at an early stage before significant investment has been made. Since structural characterization is one of the most expensive parts of the process, this study explores structure-free encoding of materials using Mendeleev encoding, a method that does not require information such as lattice constants, lattice positions, or bonding networks. We evaluate Mendeleev encoding using three data sets of continuous, complex material compounds used for battery applications, with four different unsupervised learning methods, inclusive of six algorithms and four evaluation metrics and in addition visualizations of the results. Our results show that Mendeleev encoding is more accurate, stable, and reliable than alternative structure-free encoding, allowing both principle component analysis and archetypal analysis to capture more of the variance during dimensionality reduction and consistently provide superior clustering results. Mendeleev encoding is a simple and scientifically intuitive way of representing material data that is both human and machine-readable and is applicable to any machine-learning task training with tabular data.

Original languageEnglish
Pages (from-to)9325-9338
Number of pages14
JournalChemistry of Materials
Volume35
Issue number21
DOIs
Publication statusPublished - 2023

Fingerprint

Dive into the research topics of 'Structure-Free Mendeleev Encodings of Material Compounds for Machine Learning'. Together they form a unique fingerprint.

Cite this