Reverse Engineering
- Reverse Engineering
Reverse engineering is the process of recovering the design and implementation details of a binary when source code is unavailable. In malware analysis, this means understanding what the code does by reading assembly or decompiled pseudocode.
x86-64 Assembly Basics
Most malware targets Windows (x86/x86-64) or Linux (x86-64/ARM). A basic reading knowledge of assembly is required.
Registers (x86-64)
| Register | Common use |
|---|---|
rax |
return value; accumulator |
rbx |
callee-saved general purpose |
rcx / rdi |
first argument (Windows / Linux) |
rdx / rsi |
second argument |
rsp |
stack pointer |
rbp |
frame pointer (when used) |
rip |
instruction pointer |
eflags |
condition codes (ZF, SF, CF, OF) |
Common Instructions
mov rax, rbx ; copy rbx into rax
lea rax, [rbp-0x10] ; load effective address (pointer arithmetic)
push rax ; push rax onto the stack
pop rax ; pop from stack into rax
call func ; push return address and jump to func
ret ; pop return address and jump to it
cmp rax, 0 ; set flags based on rax - 0
je label ; jump if equal (ZF=1)
jne label ; jump if not equal
xor rax, rax ; zero rax (common idiom)
test rax, rax ; set ZF if rax == 0
Calling Conventions
Windows x64: arguments in rcx, rdx, r8, r9; rest on stack; return in rax.
Linux x64 (System V AMD64 ABI): arguments in rdi, rsi, rdx, rcx, r8, r9; return in rax.
Knowing which convention is in use tells you what each register means at a call site.
Recognizing Common Patterns
Experienced analysts recognize constructs that appear over and over regardless of the compiler.
String Loops
A loop that loads bytes one at a time from a fixed address is likely processing a string:
xor ecx, ecx
loop_top:
movzx eax, byte [rbx+rcx]
test eax, eax
je done
; do something with eax
inc ecx
jmp loop_top
done:
XOR Decryption
Single-byte or multi-byte XOR is the most common simple obfuscation:
mov ecx, length
xor rdx, rdx
decrypt_loop:
movzx eax, byte [data+rdx]
xor eax, 0x42 ; key byte
mov byte [data+rdx], al
inc rdx
dec ecx
jnz decrypt_loop
If you see a loop that XORs data into itself with a fixed value, you’re looking at simple XOR decryption. Note the key and the data address.
API Hashing
To hide imports from static analysis, malware often resolves API functions at runtime by walking the PEB’s loaded module list and hashing function names until a match is found. You’ll see a loop iterating over a DLL’s export table and comparing a computed hash to a hardcoded value.
When you see this pattern, identify the hash algorithm (often custom), then write a script to brute-force which function name maps to each hash.
Debugging with GDB and pwndbg
Dynamic debugging lets you step through code, inspect registers and memory, and modify execution at runtime.
$ gdb ./sample
(gdb) run
(gdb) break main
(gdb) info registers
(gdb) x/20x $rsp # dump 20 hex words at stack pointer
(gdb) x/s 0xdeadbeef # print string at address
(gdb) disas main # disassemble main
(gdb) set $rax = 0 # set a register
(gdb) continue
With pwndbg installed, you get a much more informative display automatically.
Useful GDB Commands for Malware
catch syscall connect # break on connect() syscall
watch *0x601020 # break when memory address changes
set follow-fork-mode child # follow child on fork
IDA and Cutter Workflows
Function Identification (IDA)
IDA auto-analysis assigns names like sub_401234 to unknown functions. Work top-down from start/main:
- Find the function in the Functions window or navigate to the entry point
- Press
F5to open the decompiler — even rough pseudocode conveys structure - Press
Nto rename the function; rename local variables too (Nin the pseudocode view) - Press
Xon any name to find all callers
Data Type Recovery (IDA)
If you identify a struct being built on the heap (e.g., a malware config or WSADATA), apply it to the pointer:
- Press
Yon the pointer variable - Type the struct name (e.g.,
WSADATA *) - The decompiler updates to show named field accesses
Scripting with IDAPython
IDAPython automates repetitive tasks. Run with Alt+F7 or File → Script File:
import idc, idautils
# Print all strings
for s in idautils.Strings():
print(hex(s.ea), str(s))
# Find all calls to a specific function by name
target = idc.get_name_ea_simple("CreateRemoteThread")
for ref in idautils.CodeRefsTo(target, flow=False):
print(hex(ref), idc.get_func_name(ref))
Working in Cutter / Rizin / radare2
Cutter is the GUI for Rizin, a maintained fork of radare2. The CLI tools — rizin and r2 — share the same command vocabulary and are interchangeable for analysis tasks.
Cutter’s Console pane accepts Rizin commands directly:
afl # list all functions
pdf @main # disassemble main
pdg @main # decompile main (rz-ghidra)
iz # strings in data sections
ii # imports
axt @str.evil # cross-references to a string
From the terminal:
$ rizin -A ./sample # rizin (preferred)
$ r2 -A ./sample # radare2 (compatible)
For scripting with rz-pipe (Rizin) or r2pipe (radare2):
import rzpipe
rz = rzpipe.open("./sample", flags=["-A"])
funcs = rz.cmdj("aflj") # functions as JSON
for f in funcs:
print(hex(f["offset"]), f["name"])
rz.quit()