Definition
A B+ Tree is a self-balancing multi-level search tree commonly used in databases, filesystems, and storage engines.
It extends the B-tree by ensuring that all actual data (records) reside in the leaf nodes, while internal nodes store only keys for navigation.
Note
B+ Trees are optimized for disk-based access patterns - minimizing expensive I/O by maximizing branching factors and supporting sequential traversal through leaf linkage.
Invariants
-
All leaves appear at the same level (balanced).
-
Internal nodes store up to m−1 keys and m child pointers.
-
Leaf nodes store keys + data pointers, and one next-leaf pointer.
-
Root may have fewer keys (at least two children unless empty).
Operations
Search
To search for key k:
-
Start from the root.
-
Repeatedly descend to the appropriate child pointer until reaching a leaf.
-
Perform a linear or binary search in the leaf node.
Time Complexity: O(log_m n) - logarithmic with respect to total keys and branching factor m.
Tip
Each disk page read corresponds to one level - minimizing height is critical.
Insertion
-
Locate the leaf where
kshould be inserted. -
Insert key and value in sorted order.
-
If the leaf overflows (too many keys), split it:
-
Half the keys move to a new sibling leaf.
-
Middle key is promoted to the parent.
-
-
Propagate splits upward recursively if necessary.
function insert(node, key, value):
if node is leaf:
insert key,value
if overflow(node):
splitLeaf(node)
else:
child = selectChild(node, key)
insert(child, key, value)
if overflow(child):
splitInternal(child)Cost: O(log_m n) I/O operations.
Deletion
-
Locate the leaf containing the key.
-
Remove it.
-
If the node underflows (too few keys):
-
Redistribute keys with sibling, or
-
Merge with sibling and update parent.
-
-
Propagate adjustments upward if needed.
Warning
Deletion in B+ Trees is more complex than insertion because rebalancing may cascade up multiple levels.
Range Queries and Sequential Access
Range queries (e.g., find all keys in [k1, k2]) are where B+ Trees excel:
-
Use root-to-leaf search to locate
k1. -
Follow the leaf-level linked list until
k2is reached.
This design supports efficient ordered scans - a major reason B+ Trees dominate database indexing.
Complexity
| Operation | Time Complexity | I/O Notes |
|---|---|---|
| Search | O(log_m n) | 1 disk read per level |
| Insert | O(log_m n) | Split propagation may add 1–2 writes |
| Delete | O(log_m n) | May require merge or redistribution |
| Range Query | O(log_m n + k) | k = number of records in range |
Tip
Since m (order) is large (hundreds per node), the height
his small - often 2–4 even for millions of records.
Implementations
Structure (components)
| Component | Description |
|---|---|
| Internal Nodes | Contain keys and child pointers only (no actual data). |
| Leaf Nodes | Contain all records and pointers to next leaf (linked list). |
| Order (m) | Maximum number of children for any internal node. |
| Height (h) | Determines search cost; typically small due to high fan-out. |
Practical Notes
-
Disk blocks (usually 4 KB) dictate node size.
-
Leaves contain record pointers; internal nodes only key pointers.
-
Linked leaves enable cursor traversal and bulk scanning.
-
Rebalancing keeps operations logarithmic over time.
Warning
Caching is essential: root and top levels often kept in memory to minimize I/O cost.
Comparison
| Feature | B-Tree | B+ Tree |
|---|---|---|
| Data location | Internal + Leaf nodes | Leaf nodes only |
| Sequential access | Slower (no links) | Fast (linked leaves) |
| Range queries | Inefficient | Very efficient |
| Node size | Larger | Smaller (no data in internal nodes) |
| Storage utilization | Slightly lower | Higher |
| Use case | In-memory balanced trees | Disk-based indexes |
Note
Most database engines (MySQL, PostgreSQL) and filesystems (NTFS, HFS+, ext4) use B+ trees under the hood.
Examples
Insert sequence: [10, 20, 5, 6, 12, 30, 7, 17] into a B+ Tree of order 4 (max 3 keys per node, max 4 children).
-
Start empty - insert
[10, 20, 5]into first leaf. -
On inserting 6 → split leaf into
[5,6]and[10,20]; promote10. -
Insert 12 → fits right leaf
[10,12,20]. -
Insert 30 → overflow → split right leaf, promote
20. -
Insert 7, 17 → propagate splits as necessary.
Tip
This small example mirrors the page split behavior of database indices.
Pitfalls
-
Deletion rebalancing can cascade up multiple levels.
-
Caching strategy matters for performance.
-
Node size should align with storage page size.
Common pitfalls
Mixing B-Tree vs B+ Tree semantics (storing records in internal nodes).
Forgetting to maintain leaf links after splits/merges (breaks range scans).
Inconsistent min/max keys per node when switching between “order m” and “degree t” definitions.
Node size not aligned to page size → reduced fanout and extra I/O.
When to use
Compared to B-trees, choose B+ Trees when you need:
-
Better range queries (thanks to leaf chaining)
-
Higher fan-out (internal nodes smaller, only keys)
-
Predictable I/O (fewer disk pages to touch)
-
Efficient iteration (ordered, sequential leaves)
These advantages make B+ trees ideal for large datasets that cannot fit entirely into memory.
Summary
-
B+ Trees extend B-trees by separating data storage (leaves) from indexing (internal nodes).
-
They support efficient range queries, sequential traversal, and high fan-out - ideal for disk-based systems.
-
Insertions and deletions maintain balance, ensuring predictable logarithmic performance.
-
Almost every database and filesystem index structure builds on this design.