Seed selection for successful fuzzing

Adrian Herrera, Hendra Gunadi, Shane Magrath, Michael Norrish, Mathias Payer, Antony L. Hosking

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    61 Citations (Scopus)

    Abstract

    Mutation-based greybox fuzzing - -unquestionably the most widely-used fuzzing technique - -relies on a set of non-crashing seed inputs (a corpus) to bootstrap the bug-finding process. When evaluating a fuzzer, common approaches for constructing this corpus include: (i) using an empty file; (ii) using a single seed representative of the target's input format; or (iii) collecting a large number of seeds (e.g., by crawling the Internet). Little thought is given to how this seed choice affects the fuzzing process, and there is no consensus on which approach is best (or even if a best approach exists).

    To address this gap in knowledge, we systematically investigate and evaluate how seed selection affects a fuzzer's ability to find bugs in real-world software. This includes a systematic review of seed selection practices used in both evaluation and deployment contexts, and a large-scale empirical evaluation (over 33 CPU-years) of six seed selection approaches. These six seed selection approaches include three corpus minimization techniques (which select the smallest subset of seeds that trigger the same range of instrumentation data points as a full corpus). 

    Our results demonstrate that fuzzing outcomes vary significantly depending on the initial seeds used to bootstrap the fuzzer, with minimized corpora outperforming singleton, empty, and large (in the order of thousands of files) seed sets. Consequently, we encourage seed selection to be foremost in mind when evaluating/deploying fuzzers, and recommend that (a) seed choice be carefully considered and explicitly documented, and (b) never to evaluate fuzzers with only a single seed.

    Original languageEnglish
    Title of host publicationProceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis
    Subtitle of host publicationISSTA 2021
    EditorsCristian Cadar, Xiangyu Zhang
    PublisherAssociation for Computing Machinery, Inc
    Pages230-243
    ISBN (Electronic)9781450384599
    DOIs
    Publication statusPublished - 11 Jul 2021
    Event30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2021 - , Denmark
    Duration: 11 Jul 202117 Jul 2021
    https://conf.researchr.org/home/issta-2021

    Conference

    Conference30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2021
    Abbreviated titleISSTA 2021
    Country/TerritoryDenmark
    Period11/07/2117/07/21
    OtherThe ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA) is the leading research symposium on software testing and analysis, bringing together academics, industrial researchers, and practitioners to exchange new ideas, problems, and experience on how to analyze and test software systems.
    2021 will mark the 30th edition of ISSTA.

    ISSTA 2021 and co-located events were originally planned to take place in Aarhus, Denmark, but due to the COVID-19 situation we have decided to switch to a virtual format.
    Internet address

    Fingerprint

    Dive into the research topics of 'Seed selection for successful fuzzing'. Together they form a unique fingerprint.

    Cite this