The Core Abstraction

A computer has far fewer CPUs than things it needs to run. Your machine might have 8 cores but hundreds of active processes. The OS solves this by giving each program the illusion of having the CPU to itself, rapidly switching between them. This multiplexing is one of the most fundamental things an OS does, and the two units of execution it works with are processes and threads.

Think of processes as separate apartments in a building: isolated, can’t see each other’s stuff. Threads are roommates within one apartment: shared kitchen and living room, but they need house rules to avoid conflicts.


Processes

Created by fork (POSIX) or CreateProcess (Windows). Each process gets:

The key property is isolation. A crash in one process cannot corrupt another. If you’ve ever wondered why Chrome runs each tab as a separate process, this is why: a misbehaving website can crash its process without taking down the whole browser.

Inter-process communication (IPC) is explicit and controlled: pipes, shared memory, sockets, message queues. The cost of this isolation is that sharing data between processes requires going through the kernel.


Threads

Created within a process (pthread_create, std::thread). Threads share heap memory, global variables, and file descriptors, making communication cheap but correctness hard.

Warning

Shared mutable state is where most concurrency bugs live. Two threads reading and writing the same variable without synchronization leads to data races, which can produce arbitrarily wrong results. This is not a theoretical concern; it’s one of the most common sources of hard-to-reproduce bugs in systems code.

Synchronization primitives (mutexes, condition variables, atomics, message passing) exist to manage shared state, but they come with their own pitfalls: deadlocks, priority inversion, lock contention.

User-Level vs Kernel-Level Threads

TypeScheduled byProsCons
Kernel threadsOS schedulerCan run on different cores, syscalls don’t block siblingsHeavier to create/switch
User-level threads (green threads, goroutines)Runtime in user spaceCheap to create, fast context switchBlocking syscall can stall the runtime

Go’s goroutines and Erlang’s processes use M:N threading: many user-level threads multiplexed onto a smaller number of kernel threads. This gives you the cheap creation of user threads with the parallelism of kernel threads.


Context Switches

When the OS preempts a running thread, it saves the current state and restores another thread’s state. What gets saved:

  • General-purpose registers (rax, rbx, … on x86-64)
  • Stack pointer and program counter
  • Floating-point / SIMD state (if used)
  • Kernel stack pointer
  • For a full process switch: page-table base register (CR3 on x86), which flushes the TLB

Note

The actual register save/restore is fast (microseconds). The real cost is indirect: the new thread has a cold cache and a cold TLB. Rebuilding that working set can cost far more than the switch itself. This is why people say “context switches are expensive” even though the mechanism is simple.

Cost sketch: If a time quantum is 10 ms and a context switch costs 5 us, overhead is 0.05%, negligible. Shrink the quantum to 100 us and overhead jumps to 5%, which is noticeable. The quantum length is a tradeoff between responsiveness (short quanta) and throughput (long quanta, fewer switches).


Scheduling

The scheduler decides which thread runs next. This is one of those areas where simple policies have surprising consequences:

PolicyIdeaTrade-off
FIFO / FCFSRun to completion in arrival orderSimple but poor for short jobs behind long ones
Round-RobinFixed time quantum, rotateFair but high context-switch overhead if quantum is small
MLFQMultiple queues with different priorities and quantaGood balance of responsiveness and throughput
CFS (Linux)Virtual runtime via red-black tree, proportional shareNear-ideal fairness with O(log n) pick-next
EDF (real-time)Earliest deadline firstOptimal for periodic real-time tasks

Tip

Linux’s CFS (Completely Fair Scheduler) tracks how much CPU time each task has received and always picks the task with the least accumulated time. It uses a red-black tree keyed by virtual runtime, so picking the next task is O(log n). The elegance is that fairness falls out naturally from this simple rule.

MLFQ deserves extra attention because it handles the common case well: interactive tasks (lots of I/O, short CPU bursts) get boosted to high-priority queues, while CPU-bound tasks sink to lower-priority queues with longer quanta. The system stays responsive for the user while still giving batch work its fair share.


Process Lifecycle

A process cycles through these states:

         admit           dispatch
  NEW ─────────→ READY ──────────→ RUNNING ──→ TERMINATED
                  ↑                   │
                  │    I/O or event    │
                  └──── BLOCKED ←─────┘
                       (waiting)

Note

The BLOCKED state is important to understand. When a process does I/O (reading a file, waiting for network data), it’s not burning CPU cycles. The OS moves it to BLOCKED and runs something else. When the I/O completes, the process moves back to READY. This is why a system with many I/O-bound processes can feel responsive even with limited cores.


Concrete Example: Web Server Models

A web server has to handle many concurrent connections. The classic approaches map directly onto process/thread tradeoffs:

fork model (Apache prefork):
  parent (PID 1)
    └── child (PID 2) - handles request A (isolated address space)
    └── child (PID 3) - handles request B

thread model (Apache worker):
  process (PID 1)
    ├── thread 0 (main, accepts connections)
    ├── thread 1 - handles request A  (shared heap)
    └── thread 2 - handles request B  (shared heap)

The fork model is safer (one crash doesn’t take down others) but uses more memory. The thread model is faster (no address-space duplication) but requires careful locking. Modern servers (nginx, Node.js) often use a third approach: event-driven I/O with a single thread per core, avoiding both the memory overhead of forking and the complexity of shared-state threading.