So recently, I had to work with libfuzzer a bunch for a project, and I noticed that whilst there are many resources for writing libfuzzer test harnesses, there are few that talk about the various options libfuzzer has once compiled. So I thought I’d talk about what happens after you write your tests, and how to interpret the results. If you’re looking for resources on how to write libfuzzer tests, I recommend Apple’s tutorial.

So let’s look at some of the areas surrounding libfuzzer usage:

Table of Contents

Setup

On Mac, I used brew to install LLVM 13. The default llvm installed by XCode CLI does not include libfuzzer, and indeed does not work correctly in the way that we need. Alternatively, you could build llvm13 (or any llvm really), as I did on Linux; this takes quite a bit of RAM, time, and finagling, but can be worth it for certain processes. In fact, my initial rig that I used to get started was Void Linux with an XBPS-installed llvm, running on an i686 qemu system image. I then moved to a docker container running on my host Mac, and then finally just bit the bullet and figured out how to build the software cleanly on Mac, which was non-trivial; the developers of the software clearly hadn’t intended for it to be used anywhere but Linux, and I ended up spending a bit of time getting their libraries installed in the correct places.

Building Fuzzers

The software I was testing consisted of 3(ish) libraries, that were woven together in a myraid fashion; much of the time I spent attempting to build the fuzzer was actually spent attempting to get include files, library objects, and the like in the correct spot. I eventually settled on this monstrosity of a command line:

/usr/local/Cellar/llvm/13.0.1_1/bin/clang -g -fsanitize=address,fuzzer fuzz-test.c -I include -I ./src/somefile -L ./somelibrarypath -l somelibrary0 -l somelibrary1 -I.. -o fuzz-test

Definitely not the most pleasant, but not the worst either. One thing I did quickly find was that there were a number of tests I could write that would cause an issue in the address sanitizer or one of it’s functions, so I settled on a script to build things, and allow me to update the build quickly:

#@(#) simple build script for fuzzing

INCS="-I include -I ./src/somepath -I .."
LIBS="-L ./somelibrarypath -l somelibrary0 -l somelibrary1"
SANS="-fsanitize=address,fuzzer"
CLANG=/usr/local/Cellar/llvm/13.0.1_1/bin/clang

if [ "$#" -eq 2 ]
then
    if [ "$1" = "-no-address" ]
    then
        SANS="-fsanitize=fuzzer"
    fi
    src=$2
elif [ "$#" -eq 1 ]
then
    src=$1
else
    echo "have some other number of args..."
fi

out=${src%.*}
$CLANG $opts $SANS $INCS $LIBS $src -o $out

Now I could build fuzzers, without needing to think about what to invoke, when and how.

Running Fuzzers

This is where we begin to see where other tutorials leave off. One of the first things I noticed running my newly built fuzzer was that it produced a warning about max_len being unset, and thus it would only generate 4096 bytes of data. The Libfuzzer docs make reference to a -max_len=…​ argument to the fuzzer, so I began to tinker with that. This lead me to other options; -help=1 prints the usage guide:

Usage:

To run fuzzing pass 0 or more directories.
./fuzzer [-flag1=val1 [-flag2=val2 ...] ] [dir1 [dir2 ...] ]

To run individual tests without fuzzing pass 1 or more files:
./fuzzer [-flag1=val1 [-flag2=val2 ...] ] file1 [file2 ...]

Flags: (strictly in form -flag=value)
 verbosity                              1   Verbosity level.
 seed                                   0   Random seed. If 0, seed is generated.
 runs                                   -1  Number of individual test runs (-1 for infinite runs).
 max_len                                0   Maximum length of the test input. If 0, libFuzzer tries to guess a good value based on the corpus and reports it.
 len_control                            100 Try generating small inputs first, then try larger inputs over time.  Specifies the rate at which the length limit is increased (smaller == faster).  If 0, immediately try inputs with size up to max_len. Default value is 0, if LLVMFuzzerCustomMutator is used.
 seed_inputs                            0   A comma-separated list of input files to use as an additional seed corpus. Alternatively, an "@" followed by the name of a file containing the comma-separated list.
 keep_seed                              0   If 1, keep seed inputs in the corpus even if they do not produce new coverage. When used with |reduce_inputs==1|, the seed inputs will never be reduced. This option can be useful when seeds arenot properly formed for the fuzz target but still have useful snippets.
 cross_over                             1   If 1, cross over inputs.
 cross_over_uniform_dist                0   Experimental. If 1, use a uniform probability distribution when choosing inputs to cross over with. Some of the inputs in the corpus may never get chosen for mutation depending on the input mutation scheduling policy. With this flag, all inputs, regardless of the input mutation scheduling policy, can be chosen as an input to cross over with. This can be particularly useful with |keep_seed==1|; all the initial seed inputs, even though they do not increase coverage because they are not properly formed, will still be chosen as an input to cross over with.
 mutate_depth                           5   Apply this number of consecutive mutations to each input.
 reduce_depth                           0   Experimental/internal. Reduce depth if mutations lose unique features
 shuffle                                1   Shuffle inputs at startup
 prefer_small                           1   If 1, always prefer smaller inputs during the corpus shuffle.
 timeout                                1200    Timeout in seconds (if positive). If one unit runs more than this number of seconds the process will abort.
 error_exitcode                         77  When libFuzzer itself reports a bug this exit code will be used.
 timeout_exitcode                       70  When libFuzzer reports a timeout this exit code will be used.
 max_total_time                         0   If positive, indicates the maximal total time in seconds to run the fuzzer.
 help                                   0   Print help.
 fork                                   0   Experimental mode where fuzzing happens in a subprocess
 ignore_timeouts                        1   Ignore timeouts in fork mode
 ignore_ooms                            1   Ignore OOMs in fork mode
 ignore_crashes                         0   Ignore crashes in fork mode
 merge                                  0   If 1, the 2-nd, 3-rd, etc corpora will be merged into the 1-st corpus. Only interesting units will be taken. This flag can be used to minimize a corpus.
 stop_file                              0   Stop fuzzing ASAP if this file exists
 merge_control_file                     0   Specify a control file used for the merge process. If a merge process gets killed it tries to leave this file in a state suitable for resuming the merge. By default a temporary file will be used.The same file can be used for multistep merge process.
 minimize_crash                         0   If 1, minimizes the provided crash input. Use with -runs=N or -max_total_time=N to limit the number attempts. Use with -exact_artifact_path to specify the output. Combine with ASAN_OPTIONS=dedup_token_length=3 (or similar) to ensure that the minimized input triggers the same crash.
 cleanse_crash                          0   If 1, tries to cleanse the provided crash input to make it contain fewer original bytes. Use with -exact_artifact_path to specify the output.
 mutation_graph_file                    0   Saves a graph (in DOT format) to mutation_graph_file. The graph contains a vertex for each input that has unique coverage; directed edges are provided between parents and children where the child has unique coverage, and are recorded with the type of mutation that caused the child.
 use_counters                           1   Use coverage counters
 use_memmem                             1   Use hints from intercepting memmem, strstr, etc
 use_value_profile                      0   Experimental. Use value profile to guide fuzzing.
 use_cmp                                1   Use CMP traces to guide mutations
 shrink                                 0   Experimental. Try to shrink corpus inputs.
 reduce_inputs                          1   Try to reduce the size of inputs while preserving their full feature sets
 jobs                                   0   Number of jobs to run. If jobs >= 1 we spawn this number of jobs in separate worker processes with stdout/stderr redirected to fuzz-JOB.log.
 workers                                0   Number of simultaneous worker processes to run the jobs. If zero, "min(jobs,NumberOfCpuCores()/2)" is used.
 reload                                 1   Reload the main corpus every <N> seconds to get new units discovered by other processes. If 0, disabled
 report_slow_units                      10  Report slowest units if they run for more than this number of seconds.
 only_ascii                             0   If 1, generate only ASCII (isprint+isspace) inputs.
 dict                                   0   Experimental. Use the dictionary file.
 artifact_prefix                        0   Write fuzzing artifacts (crash, timeout, or slow inputs) as $(artifact_prefix)file
 exact_artifact_path                    0   Write the single artifact on failure (crash, timeout) as $(exact_artifact_path). This overrides -artifact_prefix and will not use checksum in the file name. Do not use the same path for several parallel processes.
 print_pcs                              0   If 1, print out newly covered PCs.
 print_funcs                            2   If >=1, print out at most this number of newly covered functions.
 print_final_stats                      0   If 1, print statistics at exit.
 print_corpus_stats                     0   If 1, print statistics on corpus elements at exit.
 print_coverage                         0   If 1, print coverage information as text at exit.
 print_full_coverage                    0   If 1, print full coverage information (all branches) as text at exit.
 dump_coverage                          0   Deprecated.
 handle_segv                            1   If 1, try to intercept SIGSEGV.
 handle_bus                             1   If 1, try to intercept SIGBUS.
 handle_abrt                            1   If 1, try to intercept SIGABRT.
 handle_ill                             1   If 1, try to intercept SIGILL.
 handle_fpe                             1   If 1, try to intercept SIGFPE.
 handle_int                             1   If 1, try to intercept SIGINT.
 handle_term                            1   If 1, try to intercept SIGTERM.
 handle_xfsz                            1   If 1, try to intercept SIGXFSZ.
 handle_usr1                            1   If 1, try to intercept SIGUSR1.
 handle_usr2                            1   If 1, try to intercept SIGUSR2.
 handle_winexcept                       1   If 1, try to intercept uncaught Windows Visual C++ Exceptions.
 close_fd_mask                          0   If 1, close stdout at startup; if 2, close stderr; if 3, close both. Be careful, this will also close e.g. stderr of asan.
 detect_leaks                           1   If 1, and if LeakSanitizer is enabled try to detect memory leaks during fuzzing (i.e. not only at shut down).
 purge_allocator_interval               1   Purge allocator caches and quarantines every <N> seconds. When rss_limit_mb is specified (>0), purging starts when RSS exceeds 50% of rss_limit_mb. Pass purge_allocator_interval=-1 to disable this functionality.
 trace_malloc                           0   If >= 1 will print all mallocs/frees. If >= 2 will also print stack traces.
 rss_limit_mb                           2048    If non-zero, the fuzzer will exit uponreaching this limit of RSS memory usage.
 malloc_limit_mb                        0   If non-zero, the fuzzer will exit if the target tries to allocate this number of Mb with one malloc call. If zero (default) same limit as rss_limit_mb is applied.
 exit_on_src_pos                        0   Exit if a newly found PC originates from the given source location. Example: -exit_on_src_pos=foo.cc:123. Used primarily for testing libFuzzer itself.
 exit_on_item                           0   Exit if an item with a given sha1 sum was added to the corpus. Used primarily for testing libFuzzer itself.
 ignore_remaining_args                  0   If 1, ignore all arguments passed after this one. Useful for fuzzers that need to do their own argument parsing.
 focus_function                         0   Experimental. Fuzzing will focus on inputs that trigger calls to this function. If -focus_function=auto and -data_flow_trace is used, libFuzzer will choose the focus functions automatically. Disables -entropic when specified.
 entropic                               1   Enables entropic power schedule.
 entropic_feature_frequency_threshold   255 Experimental. If entropic is enabled, all features which are observed less often than the specified value are considered as rare.
 entropic_number_of_rarest_features     100 Experimental. If entropic is enabled, we keep track of the frequencies only for the Top-X least abundant features (union features that are considered as rare).
 entropic_scale_per_exec_time           0   Experimental. If 1, the Entropic power schedule gets scaled based on the input execution time. Inputs with lower execution time get scheduled more (up to 30x). Note that, if 1, fuzzer stops from being deterministic even if a non-zero random seed is given.
 analyze_dict                           0   Experimental
 use_clang_coverage                     0   Deprecated; don't use
 data_flow_trace                        0   Experimental: use the data flow trace
 collect_data_flow                      0   Experimental: collect the data flow trace
 create_missing_dirs                    0   Automatically attempt to create directories for arguments that would normally expect them to already exist (i.e. artifact_prefix, exact_artifact_path, features_dir, corpus)

Flags starting with '--' will be ignored and will be passed verbatim to subprocesses.

This leads to a number of interesting flags:

When fuzzing, I often would set -max_len= some value close to the maximum resident set size (RSS); by default, libfuzzer uses 2048MB of RAM as the max RSS, so I would set the maximum length to 2G. This produced interesting analysis, wherein I could see that the limit of data generated by the fuzzer was 1-2 orders of magnitude less than the memory consumed by the system, which was useful in tuning the inputs to generate out of memory (OOM) errors.

The -runs= parameter was more tricky to intuit; the documentation states:

`-runs`
Number of individual test runs, `-1` (the default) to run indefinitely.

Which is somewhat cryptic. At first blush, I took this to mean the invidual runs of the LLVMFuzzerTestOneInput or the like, which was decidedly incorrect: setting this number too low resulted in the fuzzer running out of runs in the early setup. My assumption was that I only wanted 100-200 "runs," and thus set -runs=200, which quickly stopped the fuzzer. Experimenting lead me to set this to 200k+ runs, in order to get a reasonable set of invocations as well as limit the amount of time I was burning my laptop at 1000% CPU. I’d like to figure out what is actually meant by this parameter eventually.

Similarly to Radamsa and other fuzzers, -seed= is also extremely useful; it allows you to specify the starting seed, and like most fuzzing, corpus curation & seed collection is about 70% of the work you’ll do with libfuzzer. Often I simply deliver a fuzzer and a set of seeds that produce output to customers, rather than delivering a full corpus. It’s also useful for replaying tests; once you have something crashing, knowing what started it is hugely useful in subsequent runs.

Adding Corpora

This was the last area that was interesting to me; one of the things we ended up testing was a JSON parser, and waiting for a random mutational fuzzer to generate significant JSON could require a long time. The solution to this usually is a corpus of documents, which we can use in libfuzzer as well, even if it’s not clear at first glance. In libfuzzer, a corpus is simply a set of directories that are used to manage files that contain data used to seed the fuzzer; this set can be updated by the fuzzer itself as it generates input, or multiple corpora can be merged based on their ability to generate new cases. This is done by adding one or more directories to the arguments of your fuzzer, and potentially using the -merge=…​ flag:

./fuzzer CORPUS_DIR 
./fuzzer CUR_CORPUS NEW_CORPUS 
./fuzzer -merge=1 CUR_CORPUS SUPER_COOL_CORPUS0 SUPER_COOL_CORPUS1 

I’m not going to lie; I found the corpus management behavior surprising. I ran my initial fuzzer with ./fuzzer <lots of options> CORPUS, and found that many seeds had been saved to the directory. I’m still not exactly sure what I did to cause a save there, but it was useful later (although I ended up using a JSON corpus from go-fuzz to seed my fuzzer).

Future Directions

Armed with this, I’ll pretty much always fuzz with libfuzzer when I have something I can approach from C; generally my SOP is to approach with Radamsa for the initial pass, and then move onto manual review or targeted tooling. However, I think this was a simple enough approach that I would definitely start to use it frequently, much like I use go-fuzz & gopter for Go projects. Having said that, I’d really like to generate more of this, and maybe have some helpers floating around to instrument binaries and process results. Something like Echidna would be extremely nice, or DeepState, would be helpful for setting guardrails around my processes. Lastly, I’d love to start using Ikos more frequently; this would allow me to understand some basic area of the program, and analyze with a deeper level of precision than just fuzzing alone.