Virtual Memory

The Big Idea

Every process thinks it owns a huge, private, contiguous chunk of memory starting at address zero. None of that is true. Physical RAM is shared, fragmented, and often smaller than what all running programs collectively need. Virtual memory is the OS + hardware conspiracy that maintains this illusion, and once you understand it, a lot of other systems concepts (process isolation, fork, mmap, why your program got OOM-killed) suddenly click.

Note

Virtual memory does three things at once: translation (virtual addresses to physical), isolation (processes can’t touch each other’s memory), and overcommit (you can allocate more memory than physically exists, and the OS pages things in and out as needed).

Address Spaces and Pages

Each process gets a virtual address space (typically 48-bit on x86-64, giving 256 TB). The space is divided into fixed-size pages (commonly 4 KB). Physical memory is divided into frames of the same size. The page table maps between them.

Virtual address:  [ VPN (high bits) | Offset (low 12 bits) ]
                        |
                   page table
                        |
                        v
Physical address: [ PFN           | Offset (same 12 bits) ]

The offset stays the same on both sides. Only the page/frame number gets translated. This is clean and simple, but a flat page table for 48-bit addresses would be enormous, which is why real systems use multi-level page tables.

Multi-Level Page Tables

Modern x86-64 uses 4-level (sometimes 5-level) page table hierarchies. The trick: only allocate table pages for regions the process actually uses. A process with a small heap and stack might only need a handful of page table pages, even though it technically has a 256 TB address space. This makes sparse address spaces cheap.

Tip

This is one of those things that sounds obvious once you see it, but took me a while to internalize: the page table itself lives in memory and can be paged. The multi-level structure means you’re only paying memory costs for the parts of the address space you actually touch.

The TLB

The Translation Lookaside Buffer is a small, fast hardware cache of recent virtual-to-physical translations. This is critical for performance because every single memory access needs address translation.

TLB hit: 1-2 cycles
TLB miss: triggers a multi-level page-table walk (tens to hundreds of cycles)

Context switches flush or tag the TLB, which is a big part of why context switches are expensive. It’s not the register save/restore that hurts; it’s the cold TLB afterwards.

Warning

TLB capacity is limited (typically a few hundred to a few thousand entries). A 4 KB page size means each TLB entry covers only 4 KB. If your working set is 1 GB, you’d need 262,144 entries, far more than any TLB holds. This is where huge pages help.

Demand Paging and Page Faults

Pages don’t need to be in RAM when a process starts. When a process touches an absent page, the MMU raises a page fault, the OS loads the page from disk (or zero-fills it), updates the page table, and restarts the instruction. This lets the OS overcommit memory and prioritize active pages.

This is also how memory-mapped files work: mmap a file, and pages get faulted in from disk on first access. The process reads memory addresses; the OS transparently handles the I/O.

Page Replacement

When RAM is full and something needs to be paged in, the OS has to evict something. The choice of what to evict matters a lot:

Policy	Idea	Weakness
FIFO	Evict oldest page	Suffers from Belady’s anomaly
LRU	Evict least-recently-used page	Expensive to track exactly; approximated in practice
Clock (Second Chance)	Circular scan with reference bits	Simple, good approximation of LRU

Note

In practice, Linux uses a variant of Clock/LRU with active and inactive lists. True LRU would require updating a data structure on every memory access, which is way too expensive. The approximation works well enough.

Copy-on-Write

After fork, parent and child share the same physical pages, all marked read-only. A write triggers a fault; the OS copies just that page and gives the writer its own copy. This makes fork fast even for large processes, because most of the address space is never written by the child (especially if it immediately calls exec).

Huge Pages

Standard 4 KB pages with a 1 GB working set require 262,144 TLB entries. Huge pages (2 MB or 1 GB on x86-64) reduce TLB pressure by covering more memory per entry, at the cost of higher internal fragmentation. Linux exposes them via mmap with MAP_HUGETLB or transparently via THP (Transparent Huge Pages).

Tip

Databases and JVMs often benefit significantly from huge pages because they have large, long-lived working sets. If you see TLB miss rates dominating your performance profile, huge pages are worth trying.

Thrashing

When the working set exceeds physical memory, the system thrashes: spending more time swapping pages to and from disk than actually executing instructions. The classic symptom is disk I/O pegged at 100% with CPU utilization paradoxically low. The machine feels completely stuck even though the CPU is barely doing useful work.

Solutions: add RAM, reduce the working set, or use memory-aware scheduling.

Address Space Layout

A typical Linux x86-64 process layout:

High addresses
  ┌──────────────────┐
  │  Kernel space     │  (upper half, inaccessible to user code)
  ├──────────────────┤
  │  Stack  ↓         │  (grows downward)
  │                    │
  │  Memory-mapped     │  (shared libraries, mmap regions)
  │                    │
  │  Heap  ↑           │  (grows upward via brk/sbrk)
  ├──────────────────┤
  │  BSS (uninitialized)│
  │  Data (initialized) │
  │  Text (code)       │
  └──────────────────┘
Low addresses

Note

ASLR (Address Space Layout Randomization) shuffles the base addresses of the stack, heap, and mmap regions on each execution. This is a security measure against exploits that depend on knowing where things live in memory.

Walkthrough: Translating an Address

A process accesses virtual address 0x00007f3a_bc123456:

Split: VPN = 0x7f3abc123, offset = 0x456.
TLB lookup: check for VPN 0x7f3abc123. Assume a miss.
Page-table walk: traverse 4 levels of the page table using bits from the VPN as indexes at each level. Find PFN = 0xDEAD.
TLB fill: cache the mapping 0x7f3abc123 -> 0xDEAD.
Physical access: read from physical address 0xDEAD_456.

On subsequent accesses to the same page, the TLB hits and the translation costs 1-2 cycles instead of ~100.

Memory Allocation - how user-space allocators (malloc, arenas) work on top of virtual memory
Processes & Threads - virtual memory provides the isolation between processes
File Systems - memory-mapped files bridge virtual memory and the file system

Cam's Cyberspace

Recent Notes

Algorithm Efficiency - Bridging Theory and Practice

Home

Best, Worst & Average Cases

Explorer

Virtual Memory

The Big Idea

Address Spaces and Pages

Multi-Level Page Tables

The TLB

Demand Paging and Page Faults

Page Replacement

Copy-on-Write

Huge Pages

Thrashing

Address Space Layout

Walkthrough: Translating an Address

Graph View

Table of Contents

Backlinks

Cam's Cyberspace

Recent Notes

Algorithm Efficiency - Bridging Theory and Practice

Home

Best, Worst & Average Cases

Explorer

Virtual Memory

The Big Idea

Address Spaces and Pages

Multi-Level Page Tables

The TLB

Demand Paging and Page Faults

Page Replacement

Copy-on-Write

Huge Pages

Thrashing

Address Space Layout

Walkthrough: Translating an Address

Related Notes

Graph View

Table of Contents

Backlinks