Overview
A tree is an acyclic, connected structure of nodes joined by edges. Trees model hierarchical relations: file systems, syntax trees, organization charts, dependency graphs, scene graphs, and indexes. Choosing the right variant (rooted or unrooted, ordered or unordered, binary or k-ary) and representation (pointer-based, arrays, LCRS) affects clarity, memory footprint, and algorithmic complexity.
Note
In an undirected setting, a “tree” is a connected, acyclic graph. In a rooted setting, a “tree” is a hierarchy with a designated root that induces parent–child relations and node depths.
Motivation
Trees provide:
- Recursive decomposition: solve on a node using results from its children (postorder DP).
- Search and ordering: BSTs and balanced variants (AVL, red–black, splay) maintain order and support logarithmic operations.
- Aggregation: sums, minima, counts, and constraints computed over subtrees naturally.
- Scheduling/precedence: dependency orderings and staged execution (level order, topological-like passes).
Typical applications include parsers (ASTs), directory services, routing tables, heaps/priority queues, and spatial indexes.
Definition and Formalism
Let T = (V, E) be a finite, connected, acyclic undirected graph.
- Size
|V| = n; edges|E| = n − 1. - Degree of
u: number of incident edges. In rooted trees, out-degree is the number of children. - Rooted tree: choose a node
ras root; every edge gets an implicit direction from parent to child. - Depth of
u: number of edges fromrtou(root has depth 0). - Height of a node
u: length of the longest downward path fromuto a leaf.
Height of the tree: height ofr. Empty tree often has height−1by convention (so a single node has height0). - Leaf: node with no children. Internal node: has ≥1 child.
- Path: sequence of nodes joined by edges; simple if no repeats.
- Subtree rooted at
u:uand all its descendants. - Ordered vs unordered: in an ordered tree, the children of each node are in a fixed left-to-right order (affects traversal sequences). In an unordered tree, children are a set.
Tip
Keep a consistent convention: depth(root) = 0 and height(leaf) = 0. This avoids common off-by-ones when proving relations like
n = Σ_u 1andΣ_u outDegree(u) = n − 1.
Binary trees
A binary tree restricts each node to left and right child pointers (possibly null). Common flavors:
- Full (proper): every internal node has exactly 2 children.
- Complete: all levels filled except possibly the last, which is filled from left to right.
- Perfect: all leaves at the same depth and every internal node has 2 children.
- Degenerate: every node has at most one child (path-like).
Relations (binary, perfect): with height h, n = 2^{h+1} − 1 and leaves = 2^h.
Example
Consider the undirected tree:
1 - 2 - 3 - 4
|
5
Unrooted, it is simply acyclic and connected. Rooting at 3 yields:
- Parent(3) = ⌀; children(3) = {2,4,5}
- Depths: depth(3)=0; depth(2)=depth(4)=depth(5)=1; depth(1)=2
- Height(tree) = 2 (longest path 3→2→1)
Properties and Relationships
- Edge count: An undirected tree with
nnodes has exactlyn − 1edges (prove by induction or by acyclicity + connectivity). - Unique path: Between any two nodes
u, v, there is exactly one simple path. - Sum of degrees:
Σ_u degree(u) = 2(n − 1)(handshaking lemma). - Leaves vs branching (rooted, full binary): number of leaves
L = I + 1whereIis the count of internal nodes. - Ancestor/Descendant closure: Ancestors of a node form a chain; descendants form a subtree (a connected, closed set under the parent relation).
- Depth–height relation: For node
u, all descendantsvsatisfydepth(v) = depth(u) + distance(u, v).
Note
A forest is a disjoint union of trees. Rooting each component yields a rooted forest. Many algorithms (e.g., union–find) maintain forests.
Core Operations (Skeletons)
Subtree size and height (postorder)
function SUBTREE_SIZE(u):
if u == null: return 0
size = 1
for v in children(u):
size += SUBTREE_SIZE(v)
return size
function HEIGHT(u):
if u == null: return -1 // so leaf height is 0
h = -1
for v in children(u):
h = max(h, HEIGHT(v))
return 1 + hTraversals (binary trees)
See Tree Traversal - Overview for detailed recursive and iterative patterns:
-
Preorder (Root, Left, Right)
-
Inorder (Left, Root, Right) - meaningful for Binary Search Tree
-
Postorder (Left, Right, Root)
-
Level order (BFS by depth)
Lowest Common Ancestor (LCA) (sketch)
-
Naive: climb parents and mark ancestors (
O(h)). -
Binary lifting / Euler tour + RMQ for
O(1)queries after preprocessing.
Tip
Postorder is ideal for bottom-up DPs (heights, counts). Preorder is ideal for passing prefix information down (path sums, ancestry flags).
Representations
1) Pointer nodes (adjacency via child pointers)
struct Node { Value val; Node* left; Node* right; } // binary
struct Node { Value val; vector<Node*> children; } // general-
Pros: natural recursion; flexible arity.
-
Cons: pointer overhead; cache locality can be poor on large trees.
2) Parent array (static trees)
parent[0..n-1], where parent[root] = -1
-
Pros: compact; easy upward navigation; amenable to binary lifting tables.
-
Cons: downward traversal requires children lists or scans.
3) Children arrays / adjacency lists
- For each node, store a vector of children indices. Great for DP on DAG-like passes and iterative traversals.
4) Left-Child Right-Sibling (LCRS)
-
Represent a general ordered tree using two pointers per node:
firstChildandnextSibling. -
Pros: constant pointers per node (like binary) while supporting unbounded arity.
-
Cons: siblings become a linked list; random child access is
O(k).
5) Array-based binary trees (complete trees)
-
Map nodes to indices (heap layout): for index
i,left = 2i+1,right = 2i+2. -
Pros: eliminates pointers; excellent locality for complete trees (heaps).
-
Cons: poor for sparse or highly unbalanced shapes (wasted slots).
Implementation or Practical Context
-
Memory layout & locality: BFS-ordered allocations improve cache behavior. Arena/pool allocators cluster nodes.
-
Immutability: Functional trees favor persistent structures (path-copying) enabling cheap snapshots and rollback at the cost of extra memory.
-
Concurrency: Lock-free traversal is easiest on immutable trees. For mutable trees, use reader–writer locks or RCU-style epochs.
-
Validation: Invariants like “no cycles,” “single parent,” and “connectedness” can be checked by counting edges (
n−1) and running a DFS/Union–Find pass. -
Serialization:
-
Preorder with null markers uniquely encodes a binary tree.
-
Level order is compact for complete trees (heap-like arrays).
-
Tip
For very large static trees, compress structure with succinct representations (balanced parentheses, DFUDS) to approach information-theoretic space while supporting queries.
Common Misunderstandings
Warning
Depth vs height confusion. Depth counts edges from the root; height counts edges to a leaf. Adopt
height(leaf)=0,height(empty)=-1to keep recurrences clean.
Warning
Assuming inorder exists for general trees. Inorder is specific to binary trees. For k-ary trees, define an application-specific order.
Warning
Using heap indexing on non-complete trees. The
2i+1/2i+2formulas assume a complete binary layout. Sparse shapes waste space and break invariants.
Warning
Multiple parents / cycles. A DAG is not a tree. Trees have exactly one simple path between any two nodes. If multiple parents are needed, you’re in DAG territory - use Graph Representations and Topological Sorting.
Warning
Overflowing recursion stacks. Deep or skewed trees can exceed recursion limits. Prefer iterative traversals with explicit stacks or tail-recursion elimination.
Broader Implications
-
Algorithm design: Many NP-hard problems on general graphs become linear-time on trees using DP (treewidth intuition). Tree decompositions extend these ideas to near-tree graphs.
-
Index and search structures: Performance hinges on height. Balancing (AVL/RB) guarantees
O(log n)operations; heaps guaranteeO(1)find-min/max andO(log n)updates. -
Parallelism: Trees offer natural divide-and-conquer. Subtrees can be processed independently until shared resources (e.g., output arrays) require synchronization.
Summary
Trees encode hierarchical relations with clean recursive structure. Key choices - rooted vs unrooted, ordered vs unordered, binary vs k-ary - and representations (pointers, arrays, LCRS) determine both elegance and performance. Master the vocabulary (size, leaves, degrees, depth, height), use traversal orders suited to the task, and select representations that match shape (complete vs sparse) and workload (static vs dynamic). With these fundamentals, tree algorithms become straightforward, efficient, and robust.