Abstract
Mutation-based greybox fuzzing - -unquestionably the most widely-used fuzzing technique - -relies on a set of non-crashing seed inputs (a corpus) to bootstrap the bug-finding process. When evaluating a fuzzer, common approaches for constructing this corpus include: (i) using an empty file; (ii) using a single seed representative of the target's input format; or (iii) collecting a large number of seeds (e.g., by crawling the Internet). Little thought is given to how this seed choice affects the fuzzing process, and there is no consensus on which approach is best (or even if a best approach exists).
To address this gap in knowledge, we systematically investigate and evaluate how seed selection affects a fuzzer's ability to find bugs in real-world software. This includes a systematic review of seed selection practices used in both evaluation and deployment contexts, and a large-scale empirical evaluation (over 33 CPU-years) of six seed selection approaches. These six seed selection approaches include three corpus minimization techniques (which select the smallest subset of seeds that trigger the same range of instrumentation data points as a full corpus).
Our results demonstrate that fuzzing outcomes vary significantly depending on the initial seeds used to bootstrap the fuzzer, with minimized corpora outperforming singleton, empty, and large (in the order of thousands of files) seed sets. Consequently, we encourage seed selection to be foremost in mind when evaluating/deploying fuzzers, and recommend that (a) seed choice be carefully considered and explicitly documented, and (b) never to evaluate fuzzers with only a single seed.
Original language | English |
---|---|
Title of host publication | Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis |
Subtitle of host publication | ISSTA 2021 |
Editors | Cristian Cadar, Xiangyu Zhang |
Publisher | Association for Computing Machinery, Inc |
Pages | 230-243 |
ISBN (Electronic) | 9781450384599 |
DOIs | |
Publication status | Published - 11 Jul 2021 |
Event | 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2021 - , Denmark Duration: 11 Jul 2021 → 17 Jul 2021 https://conf.researchr.org/home/issta-2021 |
Conference
Conference | 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2021 |
---|---|
Abbreviated title | ISSTA 2021 |
Country/Territory | Denmark |
Period | 11/07/21 → 17/07/21 |
Other | The ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA) is the leading research symposium on software testing and analysis, bringing together academics, industrial researchers, and practitioners to exchange new ideas, problems, and experience on how to analyze and test software systems. 2021 will mark the 30th edition of ISSTA. ISSTA 2021 and co-located events were originally planned to take place in Aarhus, Denmark, but due to the COVID-19 situation we have decided to switch to a virtual format. |
Internet address |