The analysis and optimization of collective communications on a Beowulf cluster

Wi Bing Tan, Peter Strazdins

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    7 Citations (Scopus)

    Abstract

    This paper gives a performance analysis of the all-gather, all-reduce and reduce-scatter collective communication operations on a Beowulf cluster. This cluster has a contention-free switch-based network with multiple network interface cards per node, permitting overlapping of message transmission under certain circumstances. As well as considering traditional algorithms developed previously for parallel computers with vendor-specific networks, we also examine simpler algorithms made up of repeated sub-operations, such as broadcasts. We find that for the kind of network on the Beowulf cluster, a somewhat different performance modelling of the algorithms is required, and that some simple simulation tools had to be developed in order to fully understand some of the algorithms' performance. Our results indicate that the LAM MPI implementations for these operations may be significantly improved, and the algorithms with data exchange and potential contention perform well on the cluster. Furthermore, they indicate that algorithms permitting message overlap are slightly favoured, with a new and simple algorithm which modestly out-performs the best traditional algorithms in the case of Reduce-Scatter. With the exception that the degree of overlapping proved difficult to estimate, our performance models fitted closely with the results, and together with the simulation tools, permit a detailed understanding of the cluster's communication pattern performance.

    Original languageEnglish
    Title of host publicationProceedings - 9th International Conference on Parallel and Distributed Systems, ICPADS 2002
    PublisherIEEE Computer Society
    Pages659-666
    Number of pages8
    ISBN (Electronic)0769517609
    DOIs
    Publication statusPublished - 2002
    Event9th International Conference on Parallel and Distributed Systems, ICPADS 2002 - Taiwan, China
    Duration: 17 Dec 200220 Dec 2002

    Publication series

    NameProceedings of the International Conference on Parallel and Distributed Systems - ICPADS
    Volume2002-January
    ISSN (Print)1521-9097

    Conference

    Conference9th International Conference on Parallel and Distributed Systems, ICPADS 2002
    Country/TerritoryChina
    CityTaiwan
    Period17/12/0220/12/02

    Fingerprint

    Dive into the research topics of 'The analysis and optimization of collective communications on a Beowulf cluster'. Together they form a unique fingerprint.

    Cite this