Fuzzing and Symbolic Execution
- Fuzzing and Symbolic Execution
Fuzzing and symbolic execution are automated techniques for finding bugs in software. In the malware analysis context, they are useful for:
- Finding vulnerabilities that malware exploits
- Identifying backdoor trigger conditions (inputs that cause unusual behavior)
- Determining binary equivalence across metamorphic malware variants
- Automatically testing unpacked samples for crash-worthy inputs
Common C Vulnerabilities
Before fuzzing, you need to know what you’re looking for. These are the most common classes of memory corruption bugs.
Buffer / Stack Overflow
Writing beyond the end of a fixed-size buffer. If the buffer is on the stack, the saved return address can be overwritten.
char password[9];
scanf("%s", password); // no length limit — input longer than 8 chars overflows
Array Out-of-Bounds
Using an index that falls outside the allocated range. Off-by-one errors are the classic source.
int a[10];
a[10] = 0; // writes one past the end of the array
Failure to Null-Terminate
strcat, sprintf, and similar functions assume a null-terminated destination. If the source fills the buffer exactly, the null is written past the end.
char buf[8];
read(STDIN_FILENO, buf, 8); // fills buf with no null terminator
strcat(dest, buf); // reads past end of buf
Format String
Passing user input directly as a printf format string allows reading (and writing) arbitrary memory.
printf(argv[1]); // BAD
printf("%s", argv[1]); // GOOD
%x dumps stack words. %n writes the number of bytes printed so far to a pointer argument — classic write primitive.
Use-After-Free
Using a pointer after the memory it points to has been freed. Common in browsers and complex data structure code.
char *p = malloc(16);
free(p);
printf("%s\n", p); // undefined behavior — p may point to recycled memory
Fuzzing
Fuzzing (fuzz testing) generates large numbers of inputs and watches for crashes, hangs, or unexpected behavior.
Black-Box vs. White-Box
- Black-box: treats the program as opaque, generates random or mutation-based inputs. Simple but inefficient; unlikely to reach deeply nested code paths.
- White-box (coverage-guided): instruments the binary to track which code paths are hit. Uses that feedback to evolve inputs that reach new code. Far more efficient.
American Fuzzy Lop (AFL++)
AFL++ is the standard coverage-guided fuzzer. It instruments the target at compile time to measure branch coverage, then uses a genetic algorithm to mutate inputs toward unexplored branches.
Compile the target with AFL instrumentation:
$ CC=afl-cc CXX=afl-c++ ./configure
$ make
# or for a single file:
$ afl-cc -o target vulnerable.c
Create a seed corpus — a directory of small, valid inputs:
$ mkdir inputs
$ echo "test" > inputs/seed1
Run the fuzzer:
$ afl-fuzz -i inputs -o out ./target
AFL++ displays a live dashboard showing:
cycles done— how many times it has processed the entire queuetotal paths— number of distinct code paths discovereduniq crashes— number of unique crash-inducing inputs founduniq hangs— inputs that cause the program to hang
Crashes are saved in out/default/crashes/. Each file in that directory is an input that caused the program to crash — start triage there.
Combine with AddressSanitizer (asan) for maximum sensitivity:
$ AFL_USE_ASAN=1 afl-cc -o target vulnerable.c
$ afl-fuzz -i inputs -o out ./target
ASan catches out-of-bounds accesses that don’t crash immediately, turning subtle memory corruption into hard crashes.
libFuzzer
libFuzzer (part of LLVM/Clang) fuzzes at the function level in-process, avoiding the overhead of re-launching a process for each input. It is used to fuzz Chrome, OpenSSL, and most Google open-source projects via oss-fuzz.
// fuzz_target.c
#include <stdint.h>
#include <stddef.h>
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
// call the function you want to fuzz with `data` as input
parse_input(data, size);
return 0;
}
$ clang -fsanitize=fuzzer,address fuzz_target.c target_lib.c -o fuzz_target
$ ./fuzz_target corpus/
Mutation Strategies
AFL++ mutates inputs using strategies including:
- Bit/byte flips — flip individual bits or bytes
- Arithmetic — increment/decrement integer values at various offsets
- Known interesting values — insert values known to cause bugs:
0,-1,INT_MAX,0x80000000 - Splice — combine two inputs from the corpus
- Dictionary — insert protocol keywords or magic values provided in a
.dictfile
Provide a dictionary for format-aware fuzzing (e.g., HTTP keywords for an HTTP parser).
Symbolic Execution
Symbolic execution treats program inputs as symbols rather than concrete values. Instead of running the program with one specific input, it runs it with a symbolic variable that can represent any possible input, accumulating path constraints — conditions on the input that must be true for execution to follow each branch.
At a branch if (x > 10), both paths are explored: one with constraint x > 10, one with x <= 10. The constraint solver (SMT solver, typically Z3) can then produce a concrete value for x that satisfies the constraint and reaches any desired location.
angr
angr is a Python framework for binary analysis including symbolic execution. It can analyze stripped binaries without source code.
Basic workflow — find input that reaches a target address:
import angr
import claripy
proj = angr.Project('./target', auto_load_libs=False)
# Create symbolic stdin input (20 bytes)
flag = claripy.BVS('flag', 20 * 8)
state = proj.factory.entry_state(stdin=angr.SimFile(name='stdin', content=flag))
simgr = proj.factory.simulation_manager(state)
# Explore: find the "win" address, avoid the "fail" address
simgr.explore(find=0x4012ab, avoid=0x4012c0)
if simgr.found:
solution_state = simgr.found[0]
print(solution_state.posix.dumps(0)) # print stdin that reaches find
Injecting symbols into registers:
state = proj.factory.blank_state(addr=0x401000)
state.regs.rdi = claripy.BVS('arg', 64)
Injecting symbols into heap memory (for heap-allocated inputs):
sym_buf = claripy.BVS('buf', 64 * 8)
buf_addr = 0x44444444
state.memory.store(buf_addr, sym_buf)
# force malloc to return our controlled address by hooking it
Bitvectors
angr represents symbolic values as bitvectors — sequences of bits that can be concrete (one value) or symbolic (a range of values subject to constraints).
| Type | Description |
|---|---|
claripy.BVV(0x41, 8) |
Concrete 8-bit value 0x41 |
claripy.BVS('x', 32) |
Symbolic 32-bit variable named x |
state.solver.eval(sym) |
Concretize: solve for one satisfying value |
state.solver.add(sym > 10) |
Manually add a constraint |
State Explosion
The main limitation of symbolic execution. Every branch doubles the number of states being tracked. Programs with loops, deeply nested conditions, or complex data structures can produce millions of states, making analysis infeasible. Mitigations:
- Bounded exploration: limit the number of steps or states (
simgr.step(n=100)) - Veritesting (angr option): merge states at join points to reduce explosion
- Concolic execution: run concretely, then use symbolic execution only on interesting slices
- Manual guidance: tell angr to avoid uninteresting code paths
KLEE
KLEE is an academic symbolic execution engine built on LLVM. It requires source code (compiled to LLVM bitcode). It is less practical for malware analysis than angr but widely used in research and for vulnerability discovery in open-source software.
$ clang -emit-llvm -c -g target.c -o target.bc
$ klee target.bc
KLEE generates test cases (klee-out-*/), one per distinct path, including inputs that trigger errors (null dereferences, assertion failures, etc.).
Fuzzing + Symbolic Execution Together
The most effective approach combines both: use fuzzing to quickly explore easily-reachable code, then switch to symbolic execution for deep branches that fuzzing struggles with (complex magic value checks, multi-field protocol parsing). Tools like Driller (used in DARPA’s Cyber Grand Challenge) implement this automatically — angr takes over when AFL gets stuck.
Tools Summary
| Tool | Type | Use case |
|---|---|---|
| AFL++ | Coverage-guided fuzzer | General binary fuzzing |
| libFuzzer | In-process fuzzer | Function-level fuzzing with clang |
| honggfuzz | Coverage-guided fuzzer | Efficient multi-process fuzzing |
| angr | Symbolic execution | Binary analysis without source |
| KLEE | Symbolic execution | Source-level exhaustive testing |
| Manticore | Symbolic execution | EVM smart contracts + native binaries |
| Driller | Fuzzing + symbolic | Hybrid, used in CGC |
Useful Resources
- AFL++ documentation
- angr documentation
- angr CTF examples
- KLEE tutorial
- Fuzzing Book — open textbook on fuzzing techniques
- Trail of Bits blog — practical symbolic execution articles