courses

Fuzzing and Symbolic Execution

Fuzzing and symbolic execution are automated techniques for finding bugs in software. In the malware analysis context, they are useful for:

Common C Vulnerabilities

Before fuzzing, you need to know what you’re looking for. These are the most common classes of memory corruption bugs.

Buffer / Stack Overflow

Writing beyond the end of a fixed-size buffer. If the buffer is on the stack, the saved return address can be overwritten.

char password[9];
scanf("%s", password);   // no length limit — input longer than 8 chars overflows

Array Out-of-Bounds

Using an index that falls outside the allocated range. Off-by-one errors are the classic source.

int a[10];
a[10] = 0;   // writes one past the end of the array

Failure to Null-Terminate

strcat, sprintf, and similar functions assume a null-terminated destination. If the source fills the buffer exactly, the null is written past the end.

char buf[8];
read(STDIN_FILENO, buf, 8);   // fills buf with no null terminator
strcat(dest, buf);             // reads past end of buf

Format String

Passing user input directly as a printf format string allows reading (and writing) arbitrary memory.

printf(argv[1]);           // BAD
printf("%s", argv[1]);     // GOOD

%x dumps stack words. %n writes the number of bytes printed so far to a pointer argument — classic write primitive.

Use-After-Free

Using a pointer after the memory it points to has been freed. Common in browsers and complex data structure code.

char *p = malloc(16);
free(p);
printf("%s\n", p);    // undefined behavior — p may point to recycled memory

Fuzzing

Fuzzing (fuzz testing) generates large numbers of inputs and watches for crashes, hangs, or unexpected behavior.

Black-Box vs. White-Box

American Fuzzy Lop (AFL++)

AFL++ is the standard coverage-guided fuzzer. It instruments the target at compile time to measure branch coverage, then uses a genetic algorithm to mutate inputs toward unexplored branches.

Compile the target with AFL instrumentation:

$ CC=afl-cc CXX=afl-c++ ./configure
$ make
# or for a single file:
$ afl-cc -o target vulnerable.c

Create a seed corpus — a directory of small, valid inputs:

$ mkdir inputs
$ echo "test" > inputs/seed1

Run the fuzzer:

$ afl-fuzz -i inputs -o out ./target

AFL++ displays a live dashboard showing:

Crashes are saved in out/default/crashes/. Each file in that directory is an input that caused the program to crash — start triage there.

Combine with AddressSanitizer (asan) for maximum sensitivity:

$ AFL_USE_ASAN=1 afl-cc -o target vulnerable.c
$ afl-fuzz -i inputs -o out ./target

ASan catches out-of-bounds accesses that don’t crash immediately, turning subtle memory corruption into hard crashes.

libFuzzer

libFuzzer (part of LLVM/Clang) fuzzes at the function level in-process, avoiding the overhead of re-launching a process for each input. It is used to fuzz Chrome, OpenSSL, and most Google open-source projects via oss-fuzz.

// fuzz_target.c
#include <stdint.h>
#include <stddef.h>

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // call the function you want to fuzz with `data` as input
    parse_input(data, size);
    return 0;
}
$ clang -fsanitize=fuzzer,address fuzz_target.c target_lib.c -o fuzz_target
$ ./fuzz_target corpus/

Mutation Strategies

AFL++ mutates inputs using strategies including:

Provide a dictionary for format-aware fuzzing (e.g., HTTP keywords for an HTTP parser).

Symbolic Execution

Symbolic execution treats program inputs as symbols rather than concrete values. Instead of running the program with one specific input, it runs it with a symbolic variable that can represent any possible input, accumulating path constraints — conditions on the input that must be true for execution to follow each branch.

At a branch if (x > 10), both paths are explored: one with constraint x > 10, one with x <= 10. The constraint solver (SMT solver, typically Z3) can then produce a concrete value for x that satisfies the constraint and reaches any desired location.

angr

angr is a Python framework for binary analysis including symbolic execution. It can analyze stripped binaries without source code.

Basic workflow — find input that reaches a target address:

import angr
import claripy

proj = angr.Project('./target', auto_load_libs=False)

# Create symbolic stdin input (20 bytes)
flag = claripy.BVS('flag', 20 * 8)

state = proj.factory.entry_state(stdin=angr.SimFile(name='stdin', content=flag))

simgr = proj.factory.simulation_manager(state)

# Explore: find the "win" address, avoid the "fail" address
simgr.explore(find=0x4012ab, avoid=0x4012c0)

if simgr.found:
    solution_state = simgr.found[0]
    print(solution_state.posix.dumps(0))   # print stdin that reaches find

Injecting symbols into registers:

state = proj.factory.blank_state(addr=0x401000)
state.regs.rdi = claripy.BVS('arg', 64)

Injecting symbols into heap memory (for heap-allocated inputs):

sym_buf = claripy.BVS('buf', 64 * 8)
buf_addr = 0x44444444
state.memory.store(buf_addr, sym_buf)
# force malloc to return our controlled address by hooking it

Bitvectors

angr represents symbolic values as bitvectors — sequences of bits that can be concrete (one value) or symbolic (a range of values subject to constraints).

Type Description
claripy.BVV(0x41, 8) Concrete 8-bit value 0x41
claripy.BVS('x', 32) Symbolic 32-bit variable named x
state.solver.eval(sym) Concretize: solve for one satisfying value
state.solver.add(sym > 10) Manually add a constraint

State Explosion

The main limitation of symbolic execution. Every branch doubles the number of states being tracked. Programs with loops, deeply nested conditions, or complex data structures can produce millions of states, making analysis infeasible. Mitigations:

KLEE

KLEE is an academic symbolic execution engine built on LLVM. It requires source code (compiled to LLVM bitcode). It is less practical for malware analysis than angr but widely used in research and for vulnerability discovery in open-source software.

$ clang -emit-llvm -c -g target.c -o target.bc
$ klee target.bc

KLEE generates test cases (klee-out-*/), one per distinct path, including inputs that trigger errors (null dereferences, assertion failures, etc.).

Fuzzing + Symbolic Execution Together

The most effective approach combines both: use fuzzing to quickly explore easily-reachable code, then switch to symbolic execution for deep branches that fuzzing struggles with (complex magic value checks, multi-field protocol parsing). Tools like Driller (used in DARPA’s Cyber Grand Challenge) implement this automatically — angr takes over when AFL gets stuck.

Tools Summary

Tool Type Use case
AFL++ Coverage-guided fuzzer General binary fuzzing
libFuzzer In-process fuzzer Function-level fuzzing with clang
honggfuzz Coverage-guided fuzzer Efficient multi-process fuzzing
angr Symbolic execution Binary analysis without source
KLEE Symbolic execution Source-level exhaustive testing
Manticore Symbolic execution EVM smart contracts + native binaries
Driller Fuzzing + symbolic Hybrid, used in CGC

Useful Resources