Fuzzing and Symbolic Execution

Fuzzing and Symbolic Execution

Fuzzing and symbolic execution are automated techniques for finding bugs in software. In the malware analysis context, they are useful for:

Finding vulnerabilities that malware exploits
Identifying backdoor trigger conditions (inputs that cause unusual behavior)
Determining binary equivalence across metamorphic malware variants
Automatically testing unpacked samples for crash-worthy inputs

Common C Vulnerabilities

Before fuzzing, you need to know what you’re looking for. These are the most common classes of memory corruption bugs.

Buffer / Stack Overflow

Writing beyond the end of a fixed-size buffer. If the buffer is on the stack, the saved return address can be overwritten.

char password[9];
scanf("%s", password);   // no length limit — input longer than 8 chars overflows

Array Out-of-Bounds

Using an index that falls outside the allocated range. Off-by-one errors are the classic source.

int a[10];
a[10] = 0;   // writes one past the end of the array

Failure to Null-Terminate

strcat, sprintf, and similar functions assume a null-terminated destination. If the source fills the buffer exactly, the null is written past the end.

char buf[8];
read(STDIN_FILENO, buf, 8);   // fills buf with no null terminator
strcat(dest, buf);             // reads past end of buf

Format String

Passing user input directly as a printf format string allows reading (and writing) arbitrary memory.

printf(argv[1]);           // BAD
printf("%s", argv[1]);     // GOOD

%x dumps stack words. %n writes the number of bytes printed so far to a pointer argument — classic write primitive.

Use-After-Free

Using a pointer after the memory it points to has been freed. Common in browsers and complex data structure code.

char *p = malloc(16);
free(p);
printf("%s\n", p);    // undefined behavior — p may point to recycled memory

Fuzzing

Fuzzing (fuzz testing) generates large numbers of inputs and watches for crashes, hangs, or unexpected behavior.

Black-Box vs. White-Box

Black-box: treats the program as opaque, generates random or mutation-based inputs. Simple but inefficient; unlikely to reach deeply nested code paths.
White-box (coverage-guided): instruments the binary to track which code paths are hit. Uses that feedback to evolve inputs that reach new code. Far more efficient.

American Fuzzy Lop (AFL++)

AFL++ is the standard coverage-guided fuzzer. It instruments the target at compile time to measure branch coverage, then uses a genetic algorithm to mutate inputs toward unexplored branches.

Compile the target with AFL instrumentation:

$ CC=afl-cc CXX=afl-c++ ./configure
$ make
# or for a single file:
$ afl-cc -o target vulnerable.c

Create a seed corpus — a directory of small, valid inputs:

$ mkdir inputs
$ echo "test" > inputs/seed1

Run the fuzzer:

$ afl-fuzz -i inputs -o out ./target

AFL++ displays a live dashboard showing:

cycles done — how many times it has processed the entire queue
total paths — number of distinct code paths discovered
uniq crashes — number of unique crash-inducing inputs found
uniq hangs — inputs that cause the program to hang

Crashes are saved in out/default/crashes/. Each file in that directory is an input that caused the program to crash — start triage there.

Combine with AddressSanitizer (asan) for maximum sensitivity:

$ AFL_USE_ASAN=1 afl-cc -o target vulnerable.c
$ afl-fuzz -i inputs -o out ./target

ASan catches out-of-bounds accesses that don’t crash immediately, turning subtle memory corruption into hard crashes.

libFuzzer

libFuzzer (part of LLVM/Clang) fuzzes at the function level in-process, avoiding the overhead of re-launching a process for each input. It is used to fuzz Chrome, OpenSSL, and most Google open-source projects via oss-fuzz.

// fuzz_target.c
#include <stdint.h>
#include <stddef.h>

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // call the function you want to fuzz with `data` as input
    parse_input(data, size);
    return 0;
}

$ clang -fsanitize=fuzzer,address fuzz_target.c target_lib.c -o fuzz_target
$ ./fuzz_target corpus/

Mutation Strategies

AFL++ mutates inputs using strategies including:

Bit/byte flips — flip individual bits or bytes
Arithmetic — increment/decrement integer values at various offsets
Known interesting values — insert values known to cause bugs: 0, -1, INT_MAX, 0x80000000
Splice — combine two inputs from the corpus
Dictionary — insert protocol keywords or magic values provided in a .dict file

Provide a dictionary for format-aware fuzzing (e.g., HTTP keywords for an HTTP parser).

Symbolic Execution

Symbolic execution treats program inputs as symbols rather than concrete values. Instead of running the program with one specific input, it runs it with a symbolic variable that can represent any possible input, accumulating path constraints — conditions on the input that must be true for execution to follow each branch.

At a branch if (x > 10), both paths are explored: one with constraint x > 10, one with x <= 10. The constraint solver (SMT solver, typically Z3) can then produce a concrete value for x that satisfies the constraint and reaches any desired location.

angr

angr is a Python framework for binary analysis including symbolic execution. It can analyze stripped binaries without source code.

Basic workflow — find input that reaches a target address:

import angr
import claripy

proj = angr.Project('./target', auto_load_libs=False)

# Create symbolic stdin input (20 bytes)
flag = claripy.BVS('flag', 20 * 8)

state = proj.factory.entry_state(stdin=angr.SimFile(name='stdin', content=flag))

simgr = proj.factory.simulation_manager(state)

# Explore: find the "win" address, avoid the "fail" address
simgr.explore(find=0x4012ab, avoid=0x4012c0)

if simgr.found:
    solution_state = simgr.found[0]
    print(solution_state.posix.dumps(0))   # print stdin that reaches find

Injecting symbols into registers:

state = proj.factory.blank_state(addr=0x401000)
state.regs.rdi = claripy.BVS('arg', 64)

Injecting symbols into heap memory (for heap-allocated inputs):

sym_buf = claripy.BVS('buf', 64 * 8)
buf_addr = 0x44444444
state.memory.store(buf_addr, sym_buf)
# force malloc to return our controlled address by hooking it

Bitvectors

angr represents symbolic values as bitvectors — sequences of bits that can be concrete (one value) or symbolic (a range of values subject to constraints).

Type	Description
`claripy.BVV(0x41, 8)`	Concrete 8-bit value `0x41`
`claripy.BVS('x', 32)`	Symbolic 32-bit variable named `x`
`state.solver.eval(sym)`	Concretize: solve for one satisfying value
`state.solver.add(sym > 10)`	Manually add a constraint

State Explosion

The main limitation of symbolic execution. Every branch doubles the number of states being tracked. Programs with loops, deeply nested conditions, or complex data structures can produce millions of states, making analysis infeasible. Mitigations:

Bounded exploration: limit the number of steps or states (simgr.step(n=100))
Veritesting (angr option): merge states at join points to reduce explosion
Concolic execution: run concretely, then use symbolic execution only on interesting slices
Manual guidance: tell angr to avoid uninteresting code paths

KLEE

KLEE is an academic symbolic execution engine built on LLVM. It requires source code (compiled to LLVM bitcode). It is less practical for malware analysis than angr but widely used in research and for vulnerability discovery in open-source software.

$ clang -emit-llvm -c -g target.c -o target.bc
$ klee target.bc

KLEE generates test cases (klee-out-*/), one per distinct path, including inputs that trigger errors (null dereferences, assertion failures, etc.).

Fuzzing + Symbolic Execution Together

The most effective approach combines both: use fuzzing to quickly explore easily-reachable code, then switch to symbolic execution for deep branches that fuzzing struggles with (complex magic value checks, multi-field protocol parsing). Tools like Driller (used in DARPA’s Cyber Grand Challenge) implement this automatically — angr takes over when AFL gets stuck.

Tools Summary

Tool	Type	Use case
AFL++	Coverage-guided fuzzer	General binary fuzzing
libFuzzer	In-process fuzzer	Function-level fuzzing with clang
honggfuzz	Coverage-guided fuzzer	Efficient multi-process fuzzing
angr	Symbolic execution	Binary analysis without source
KLEE	Symbolic execution	Source-level exhaustive testing
Manticore	Symbolic execution	EVM smart contracts + native binaries
Driller	Fuzzing + symbolic	Hybrid, used in CGC

Useful Resources

AFL++ documentation
angr documentation
angr CTF examples
KLEE tutorial
Fuzzing Book — open textbook on fuzzing techniques
Trail of Bits blog — practical symbolic execution articles