A Machine Learning Approach Towards Runtime Optimisation of Matrix Multiplication

Yufan Xia, Marco De La Pierre, Amanda S. Barnard, Giuseppe Maria Junior Barca

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    3 Citations (Scopus)

    Abstract

    The GEneral Matrix Multiplication (GEMM) is one of the essential algorithms in scientific computing. Single-thread GEMM implementations are well-optimised with techniques like blocking and autotuning. However, due to the complexity of modern multi-core shared memory systems, it is challenging to determine the number of threads that minimises the multi-thread GEMM runtime.We present a proof-of-concept approach to building an Architecture and Data-Structure Aware Linear Algebra (ADSALA) software library that uses machine learning to optimise the runtime performance of BLAS routines. More specifically, our method uses a machine learning model on-the-fly to automatically select the optimal number of threads for a given GEMM task based on the collected training data. Test results on two different HPC node architectures, one based on a two-socket Intel Cascade Lake and the other on a two-socket AMD Zen 3, revealed a 25 to 40 per cent speedup compared to traditional GEMM implementations in BLAS when using GEMM of memory usage within 100 MB.

    Original languageEnglish
    Title of host publicationProceedings - 2023 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages524-534
    Number of pages11
    ISBN (Electronic)9798350337662
    DOIs
    Publication statusPublished - 2023
    Event37th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023 - St. Petersburg, United States
    Duration: 15 May 202319 May 2023

    Publication series

    NameProceedings - 2023 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023

    Conference

    Conference37th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023
    Country/TerritoryUnited States
    CitySt. Petersburg
    Period15/05/2319/05/23

    Fingerprint

    Dive into the research topics of 'A Machine Learning Approach Towards Runtime Optimisation of Matrix Multiplication'. Together they form a unique fingerprint.

    Cite this