Efficient evaluation of scheduling metrics using emulation: A case study in the effect of artefacts

Claudio Barberato, Peter E. Strazdins, Eric McCreath, Muhammad Atif

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    1 Citation (Scopus)

    Abstract

    Scheduling algorithms have a significant impact in the optimal utilization of HPC facilities. Waiting time, response time, slowdown and weighted slowdown are classical metrics used to compare the performance of different scheduling algorithms. This paper investigates the effects of four artefacts, namely non-determinism, shuffling, time shrinking and sampling, on these metrics. We present a scheduling framework based on emulation, that is, using a real scheduler (Slurm) with a sleep program able to take into account periods of suspension. The framework is able to emulate a 50K core cluster using 10 virtualized nodes, with the scheduler running on an isolated node. We find that the non-determinism in repeatedly running a workload has a small but discernible effect of these metrics, and that shuffling job order in a workload increases this by a factor of 5-10. Experiments with shuffled workloads indicate that the average difference of the Backfill and Suspend-Resume strategy performance is within this variation. We also propose methodologies for time shrinking and sampling to decrease the duration of emulations, while aiming to keep these metrics invariant (or linear variant) with the original workload. We find that time shrinking to a factor of up to 90% can have similar effect on the metrics as non-determinism. For sampling, our methodology preserved the distribution of job sizes to a high extent, but had a variation in the metrics somewhat greater than for shuffling. Finally, we use our framework to study in-depth Slurm's scheduling performance, and discover a deficiency in the Suspend-Resume implementation.

    Original languageEnglish
    Title of host publication47th International Conference on Parallel Processing, ICPP 2018
    Subtitle of host publicationWorkshop Proceedings
    PublisherAssociation for Computing Machinery (ACM)
    ISBN (Print)9781450365239
    DOIs
    Publication statusPublished - 13 Aug 2018
    Event47th International Conference on Parallel Processing, ICPP 2018 - Eugene, United States
    Duration: 13 Aug 201816 Aug 2018

    Publication series

    NameACM International Conference Proceeding Series

    Conference

    Conference47th International Conference on Parallel Processing, ICPP 2018
    Country/TerritoryUnited States
    CityEugene
    Period13/08/1816/08/18

    Fingerprint

    Dive into the research topics of 'Efficient evaluation of scheduling metrics using emulation: A case study in the effect of artefacts'. Together they form a unique fingerprint.

    Cite this