Federated data processing and learning for collaboration in the physical sciences

W. Huang, A. S. Barnard*

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    2 Citations (Scopus)

    Abstract

    Property analysis and prediction is a challenging topic in fields such as chemistry, nanotechnology and materials science, and often suffers from lack of data. Federated learning (FL) is a machine learning (ML) framework that encourages privacy-preserving collaborations between data owners, and potentially overcomes the need to combine data that may contain proprietary information. Combining information from different data sets within the same domain can also produce ML models with more general insight and reduce the impact of the selection bias inherent in small, individual studies. In this paper we propose using horizontal FL to mitigate these data limitation issues and explore the opportunity for data-driven collaboration under these constraints. We also propose FedRed, a new dimensionality reduction method for FL, that allows faster convergence and accounts for differences between individual data sets. The FL pipeline has been tested on a collection of eight different data sets of metallic nanoparticles, and while there are expected losses compared to a combined data set that does not preserve the privacy of the collaborators, we obtained extremely good result compared to local training on individual data sets. We conclude that FL is an effective and efficient method for the physical science domain that could hugely reduce the negative effect of insufficient data.

    Original languageEnglish
    Article number045023
    Pages (from-to)1-12
    Number of pages12
    JournalMachine Learning: Science and Technology
    Volume3
    Issue number4
    DOIs
    Publication statusPublished - 1 Dec 2022

    Fingerprint

    Dive into the research topics of 'Federated data processing and learning for collaboration in the physical sciences'. Together they form a unique fingerprint.

    Cite this