Reverse Engineering

Reverse Engineering

Reverse engineering is the process of recovering the design and implementation details of a binary when source code is unavailable. In malware analysis, this means understanding what the code does by reading assembly or decompiled pseudocode.

x86-64 Assembly Basics

Most malware targets Windows (x86/x86-64) or Linux (x86-64/ARM). A basic reading knowledge of assembly is required.

Registers (x86-64)

Register	Common use
`rax`	return value; accumulator
`rbx`	callee-saved general purpose
`rcx` / `rdi`	first argument (Windows / Linux)
`rdx` / `rsi`	second argument
`rsp`	stack pointer
`rbp`	frame pointer (when used)
`rip`	instruction pointer
`eflags`	condition codes (ZF, SF, CF, OF)

Common Instructions

mov rax, rbx         ; copy rbx into rax
lea rax, [rbp-0x10]  ; load effective address (pointer arithmetic)
push rax             ; push rax onto the stack
pop rax              ; pop from stack into rax
call func            ; push return address and jump to func
ret                  ; pop return address and jump to it
cmp rax, 0           ; set flags based on rax - 0
je  label            ; jump if equal (ZF=1)
jne label            ; jump if not equal
xor rax, rax         ; zero rax (common idiom)
test rax, rax        ; set ZF if rax == 0

Calling Conventions

Windows x64: arguments in rcx, rdx, r8, r9; rest on stack; return in rax.

Linux x64 (System V AMD64 ABI): arguments in rdi, rsi, rdx, rcx, r8, r9; return in rax.

Knowing which convention is in use tells you what each register means at a call site.

Recognizing Common Patterns

Experienced analysts recognize constructs that appear over and over regardless of the compiler.

String Loops

A loop that loads bytes one at a time from a fixed address is likely processing a string:

xor  ecx, ecx
loop_top:
  movzx eax, byte [rbx+rcx]
  test  eax, eax
  je    done
  ; do something with eax
  inc   ecx
  jmp   loop_top
done:

XOR Decryption

Single-byte or multi-byte XOR is the most common simple obfuscation:

mov  ecx, length
xor  rdx, rdx
decrypt_loop:
  movzx eax, byte [data+rdx]
  xor   eax, 0x42           ; key byte
  mov   byte [data+rdx], al
  inc   rdx
  dec   ecx
  jnz   decrypt_loop

If you see a loop that XORs data into itself with a fixed value, you’re looking at simple XOR decryption. Note the key and the data address.

API Hashing

To hide imports from static analysis, malware often resolves API functions at runtime by walking the PEB’s loaded module list and hashing function names until a match is found. You’ll see a loop iterating over a DLL’s export table and comparing a computed hash to a hardcoded value.

When you see this pattern, identify the hash algorithm (often custom), then write a script to brute-force which function name maps to each hash.

Debugging with GDB and pwndbg

Dynamic debugging lets you step through code, inspect registers and memory, and modify execution at runtime.

$ gdb ./sample
(gdb) run
(gdb) break main
(gdb) info registers
(gdb) x/20x $rsp          # dump 20 hex words at stack pointer
(gdb) x/s 0xdeadbeef      # print string at address
(gdb) disas main           # disassemble main
(gdb) set $rax = 0         # set a register
(gdb) continue

With pwndbg installed, you get a much more informative display automatically.

Useful GDB Commands for Malware

catch syscall connect        # break on connect() syscall
watch *0x601020              # break when memory address changes
set follow-fork-mode child   # follow child on fork

IDA and Cutter Workflows

Function Identification (IDA)

IDA auto-analysis assigns names like sub_401234 to unknown functions. Work top-down from start/main:

Find the function in the Functions window or navigate to the entry point
Press F5 to open the decompiler — even rough pseudocode conveys structure
Press N to rename the function; rename local variables too (N in the pseudocode view)
Press X on any name to find all callers

Data Type Recovery (IDA)

If you identify a struct being built on the heap (e.g., a malware config or WSADATA), apply it to the pointer:

Press Y on the pointer variable
Type the struct name (e.g., WSADATA *)
The decompiler updates to show named field accesses

Scripting with IDAPython

IDAPython automates repetitive tasks. Run with Alt+F7 or File → Script File:

import idc, idautils

# Print all strings
for s in idautils.Strings():
    print(hex(s.ea), str(s))

# Find all calls to a specific function by name
target = idc.get_name_ea_simple("CreateRemoteThread")
for ref in idautils.CodeRefsTo(target, flow=False):
    print(hex(ref), idc.get_func_name(ref))

Working in Cutter / Rizin / radare2

Cutter is the GUI for Rizin, a maintained fork of radare2. The CLI tools — rizin and r2 — share the same command vocabulary and are interchangeable for analysis tasks.

Cutter’s Console pane accepts Rizin commands directly:

afl           # list all functions
pdf @main     # disassemble main
pdg @main     # decompile main (rz-ghidra)
iz            # strings in data sections
ii            # imports
axt @str.evil # cross-references to a string

From the terminal:

$ rizin -A ./sample      # rizin (preferred)
$ r2 -A ./sample         # radare2 (compatible)

For scripting with rz-pipe (Rizin) or r2pipe (radare2):

import rzpipe
rz = rzpipe.open("./sample", flags=["-A"])
funcs = rz.cmdj("aflj")   # functions as JSON
for f in funcs:
    print(hex(f["offset"]), f["name"])
rz.quit()

courses