These are my notes on x86 Assembly semantics written in the Intel syntax format (
opcode dst, src).
Malware programs are designed to disclose, alter, and destroy information & services. Yet, they function the same way benign programs do. They are simply a collection of instructions for the CPU to execute. What makes them malicious or not is their intented purpose. CPU instructions can be studied most accurately using Assembly, a low-level language of mnemonics that are directly mapped to machine code computers understand. In comparison, malware authors may use Assembly to perfect their exploits and avoid bloat higher-level computer languages may cause during software compilation.
CPU Instructions in the context of the x86 Assembly Language can be parsed into operation codes and operands. For example, a x86 Assembly instruction will look similar to the following:
mov eax 0x41 ; move the ASCII character "A" into the EAX register
An operation code, or opcode, is the action to perform. In our example,
mov is what we’re telling the CPU to do. Operands are arguments, data, or the subject we want to perform an action against. Here,
0x41 are operands.
The following are commonly used opcodes within the x86 Assembly instruction set.
mov eax, ebx ; copies 'ebx' into 'eax'
add eax, ebx ; adds 'ebx' to 'eax' and saves result in 'eax'
sub eax, ebx ; subtracts 'ebx' from 'eax' and saves result in 'eax' ; modifies two flags: ZF if result is zero, CF if result is 'eax' < 'ebx'
; other supported opcodes lea mul imul div idiv or xor shr shl ror nop
push 0x41 ; pushes item on top of the stack
; other stack-related opcodes pop call leave enter ret
Common operands in x86 Assembly are Immediate Values, Registers, and Memory addresses.
Immediate values can be overt and/or fixed. For example, the value
0x41 is fixed as
A in ASCII.
There are General Registers, the EFLAGS Register, and Segment Registers. General Registers are used to hold data values during program execution:
- esp: points to the top of the stack; changes as items are pushed/popped
- ebp: points to base of function; used it to orient local variables
On x86 systems, General Registers can hold 32 bits (4 bytes of data) each. They can also be divided into additional Registers to make specifying & fetching data more efficient:
eax = 32 bits ; a 9 d c 8 1 f 5 ax = 16 bits ; 8 1 f 5 ah = 8 bits ; 8 1 al = 4 bits ; f 5
The EFLAGS Register can also hold 32 bits of data which is used to help make logical decisions. Each bit represents a different flag:
- ZF (Zero Flag): set when result is set to Zero
- CF (Carry Flag): set when result is too small/big for destination operand
- SF (Sign Flag): set when result is Negative (-)
- TF (Trap Flag): used for debugging; if set, the processor will execute one instruction at a time
Segment Registers track a program’s various sections in memory:
- The Stack: used for local variables; pulsates in size as functions are executed
- The Heap: designated for dynamically creating and/or eliminating new variables
- bss: uninitialized variables
- data: global & static variables that are required or explicitly initialized
- text: the program’s instructions
Addresses are locations in memory. They can be represented literally or like this
[eax] (this value is, “the memory address of
Endianness indicates how bytes may be arranged.
x86 Assembly arranges bytes using the Little Endian format, which means they are processed starting with the Smallest, or Least Significant Byte, first.
Networking protocols arrange bytes using the Big Endian format, where the Biggest, or Most Significant Byte is addressed first. Take the network loopback address
127.0.0.1 as an example. It’s first octet (
011111111) in Hexadecimal is
7f. In it’s entirety,
127.0.0.1 in Hexadecimal would be
7f 00 00 01. If it was written in Little Endian, it could be misinterpreted by network devices hard-coded to process data in Big Endian.
# 127.0.0.1 in binary 01111111.00000000.00000000.00000001 # the first octet of 127.0.0.1 in Hexadecimal is 7f 01111111 = 0111 1111 = (0+4+2+1) + (8+4+2+1) = 7+15 = 7+f # 127.0.0.1 in the Big Endian format 7f 00 00 01 # 127.0.0.1 in the Little Endian format 01 00 00 7f