Seed selection for successful fuzzing

Adrian Herrera, Hendra Gunadi, Shane Magrath, Michael Norrish, Mathias Payer, Antony L. Hosking

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    46 Citations (Scopus)

    Abstract

    Mutation-based greybox fuzzing - -unquestionably the most widely-used fuzzing technique - -relies on a set of non-crashing seed inputs (a corpus) to bootstrap the bug-finding process. When evaluating a fuzzer, common approaches for constructing this corpus include: (i) using an empty file; (ii) using a single seed representative of the target's input format; or (iii) collecting a large number of seeds (e.g., by crawling the Internet). Little thought is given to how this seed choice affects the fuzzing process, and there is no consensus on which approach is best (or even if a best approach exists). To address this gap in knowledge, we systematically investigate and evaluate how seed selection affects a fuzzer's ability to find bugs in real-world software. This includes a systematic review of seed selection practices used in both evaluation and deployment contexts, and a large-scale empirical evaluation (over 33 CPU-years) of six seed selection approaches. These six seed selection approaches include three corpus minimization techniques (which select the smallest subset of seeds that trigger the same range of instrumentation data points as a full corpus). Our results demonstrate that fuzzing outcomes vary significantly depending on the initial seeds used to bootstrap the fuzzer, with minimized corpora outperforming singleton, empty, and large (in the order of thousands of files) seed sets. Consequently, we encourage seed selection to be foremost in mind when evaluating/deploying fuzzers, and recommend that (a) seed choice be carefully considered and explicitly documented, and (b) never to evaluate fuzzers with only a single seed.

    Original languageEnglish
    Title of host publicationISSTA 2021 - Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis
    EditorsCristian Cadar, Xiangyu Zhang
    PublisherAssociation for Computing Machinery, Inc
    Pages230-243
    Number of pages14
    ISBN (Electronic)9781450384599
    DOIs
    Publication statusPublished - 11 Jul 2021
    Event30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2021 - Virtual, Online, Denmark
    Duration: 11 Jul 202117 Jul 2021

    Publication series

    NameISSTA 2021 - Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

    Conference

    Conference30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2021
    Country/TerritoryDenmark
    CityVirtual, Online
    Period11/07/2117/07/21

    Fingerprint

    Dive into the research topics of 'Seed selection for successful fuzzing'. Together they form a unique fingerprint.

    Cite this