To effectively extract structure from physical memory and understand how malicious code can compromise system security, you should have a firm understanding of the programming model that the CPU provides for accessing memory.
The Art of Memory Forensics by Case, Levy, Ligh, & Walters
The Von Neumann Model
In 1945, John von Neumann, a computer scientist, proposed an architecture for general-purpose digital computers. We now use this architecture for designing microprocessors (Central Processing Units, or CPUs) that support a specific set of operation codes (opcodes) and operands.
Personally, I compare how a CPU functions to the way a kitchen works. They both fulfill orders provided to them using a system greater than its parts. For instance, a CPU uses memory like a fridge to house instructions. It uses a Control Unit like a sous chef to fetch instructions and uses an Arithmetic Logic Unit (ALU) as a head chef to execute instructions. Finally, the CPU uses Registers as counter-tops to not only store the address of the next instruction but also any output from the main chef, or ALU.
Machine Languages for Humans
To communicate with a CPU, one must either know binary (a language of zeroes & ones the CPU understands) or the appropiate opcodes and operands that represent binary (a language both humans and CPUs understand). Although, the same opcodes used to assemble bits of binary on one CPU are not always the same for other CPUs; so these instruction sets or assembly languages will very between systems.
For example, a CPU designed according to the 32-bit Intel Architecture uses the x86 instruction set (x86 is a reference to the original family of Intel microprocessors: 80186, 80286, 80386, etc.). Although, one would need to use the x64 instruction set if they wanted to communicate with a CPU that uses a 64-bit architecture.
Memory as an Address Table
In 32-bit computing, CPUs use memory addresses up to 32 bits in size. In 64-bit computing, CPUs use memory addresses up to 64 bits in size. The longer the address space, the more locations CPU instructions (again, opcodes and operands) can be stored prior to execution.
When a program is fed to a CPU, the program is allocated a virtual address space or table in memory. As mentioned, x86 computers use 32-bit addresses. For humans, these appear to us as hexadecimal strings eight characters long (
00008130 is an example).
To explain, the number of unique values in binary (machine language) is
2 (as in
1). The number of unique values in hexadecimal is
16 (as in
f). Hexadecimal values can be represented in binary using four bits
0000 for each character (
16 possible values). For instance,
0000 0000 0000 0000 1000 0001 0011 0000 represents the memory address
00008130. The first half are zeroes while the last four groups of binary correspond to their hexadecimal equivalents.
More importantly, addresses help one understand how various sections of a program are organized in memory.
Program Sections in Memory
A program seperates variables across five different sections in memory.
- The Stack = used for local variables; pulsates in size as functions are executed
- The Heap = designated for dynamically creating and/or eliminating new variables
bss= uninitialized variables
data= global & static variables that are required or explicitly initialized
text= the program’s instructions (opcodes)
Depending upon the design & intended purpose, malware will seek to misuse these sections as well as perform system calls, or syscalls, to override various CPU protections.
Protection Rings & Syscalls
CPUs rely on the operating system to employ two Protection Rings in safe-guarding against software with invalid or unauthorized CPU instructions. Called Ring 0 and Ring 3, they represent what are known as Kernel Mode and User Mode respectively. All CPU instructions are allowed in Kernel Mode, but they are filtered in User Mode. For example, an CPU instruction needing access to hardware will be denied or ignored if not properly submitted to the OS’s kernel in User Mode. The correct way to submit such a CPU instruction is through the use of syscalls.