Overview
“Searching” asks: where is the target? Answers depend on what structure you have:
-
No structure → linear scan is the baseline (
Theta(n)). -
Total order (sorted array) with comparisons → binary search (
Theta(log n)). -
Hashable keys with good hashing → expected
Theta(1)lookup (not order-aware). -
Monotone predicates over integers/reals → binary search on the answer.
-
Special geometry (unimodal/bitonic) → tailored ternary/bitonic searches.
This overview maps problems to strategies, shows archetypal pseudocode, explains how to adapt binary search for first/last occurrence and rotated arrays, and catalogs pitfalls (off-by-one, overflow, duplicate handling, comparator issues).
Note
Terminology: Let
A[0..n-1]be 0-indexed. Use half-open ranges[lo, hi)for clarity unless noted. Comparators must define a strict weak ordering.
Motivation
Choosing the right search saves orders of magnitude:
-
Scanning a million elements at ~5 GB/s costs ~0.8 ms just to read; logarithmic search touches ~20 elements.
-
For monotone decision problems (e.g., “Is there a feasible
x?”), turning feasibility into a predicate and binary-searching the smallestxis often cleaner than designing a custom algorithm from scratch. -
For text and graphs, searching becomes pattern matching or traversal; thinking in invariants still applies.
Definition and Formalism
At its core, searching either:
-
Enumerates candidates until a condition holds (
for/whilescan), or -
Partitions the domain into regions where the predicate is known to be false/true and recursively discards half (or more) each step.
Binary search invariant (ordered arrays). Maintain indices lo and hi such that:
-
All indices
< loare strictly less than the target. -
All indices
>= hiare greater than or equal (or strictly greater, depending on variant). Stop whenlo == hi; that position is either the insertion point or proof of absence.
Example or Illustration
Linear search (baseline; early exit)
function LINEAR_SEARCH(A, key):
for i in 0..|A|-1:
if A[i] == key:
return i
return NOT_FOUND-
Works on any container; good when
nis tiny or key found early. -
Use for streamed data where ordering is unavailable or expensive to build.
Binary search (membership / position)
Use half-open interval [lo, hi); compute mid = lo + (hi - lo) // 2.
function BINARY_SEARCH_EQUALS(A, key): // returns index or NOT_FOUND
lo = 0; hi = |A|
while lo < hi:
mid = lo + (hi - lo) // 2
if A[mid] < key:
lo = mid + 1
else if A[mid] > key:
hi = mid
else:
return mid
return NOT_FOUNDLower/upper bound (first/last occurrence; insertion points)
function LOWER_BOUND(A, key): // first i with A[i] >= key
lo = 0; hi = |A|
while lo < hi:
mid = lo + (hi - lo) // 2
if A[mid] < key: lo = mid + 1 else: hi = mid
return lo
function UPPER_BOUND(A, key): // first i with A[i] > key
lo = 0; hi = |A|
while lo < hi:
mid = lo + (hi - lo) // 2
if A[mid] <= key: lo = mid + 1 else: hi = mid
return lo-
Count of equals:
UPPER_BOUND(A,key) - LOWER_BOUND(A,key). -
Index of last occurrence:
UPPER_BOUND(A,key) - 1(check range).
Exponential search (unknown n)
When array length is unknown or you search a conceptually infinite stream:
- Grow a bound
hiexponentially untilA[hi] >= keyor end; 2) binary-search in[hi/2, hi]. Cost:O(log p)wherepis the position ofkey.
Searching in a rotated sorted array
Array is sorted then rotated, e.g., [4,5,6,7,0,1,2]. In each step, one half is sorted:
function SEARCH_ROTATED(A, key):
lo = 0; hi = |A| - 1
while lo <= hi:
mid = lo + (hi - lo) // 2
if A[mid] == key: return mid
if A[lo] <= A[mid]: // left half sorted
if A[lo] <= key && key < A[mid]: hi = mid - 1
else: lo = mid + 1
else: // right half sorted
if A[mid] < key && key <= A[hi]: lo = mid + 1
else: hi = mid - 1
return NOT_FOUND- With duplicates, use care: equality
A[lo] == A[mid] == A[hi]may force a linear fallback.
Ternary search (unimodal function on integers/reals)
If f(x) decreases then increases (unimodal), ternary search narrows the interval by comparing two midpoints. On discrete domains, stop when small and scan.
Applications: finding argmin/argmax for convex/unimodal cost when derivatives are inconvenient.
Searching by predicate (binary search on the answer)
Given monotone predicate P(x) (false…false, then true…true), find the smallest x with P(x)=true:
function FIRST_TRUE(P, lo, hi): // domain [lo, hi], P monotone
while lo < hi:
mid = lo + (hi - lo) // 2
if P(mid): hi = mid else: lo = mid + 1
return loExamples: earliest feasible start time, minimal capacity to satisfy constraints, minimal radius covering points.
Properties and Relationships
-
Decision-tree lower bound. For comparison-based search over
nordered distinct keys, the decision tree has at leastnleaves ⇒ height >= ceil(log2 n). Hence binary search is optimal up to constants. -
Hash vs order. Hash tables offer expected
Theta(1)lookups but do not give order statistics (predecessor/successor). Balanced BSTs (e.g., Red–Black Tree, AVL Tree) supportTheta(log n)search plus ordered traversals. -
Two-pointers / sliding window. On arrays with nonnegative or monotone structures, many “find subarray meeting constraint” problems are solved in linear time by growing/shrinking a window rather than binary searching a length.
-
Interpolation search. For uniformly distributed keys in arrays, estimating the probe position can yield average
O(log log n)probes, but worst case isO(n)and performance is highly data-dependent.
Tip
Prefer binary search variants (lower/upper bound) as primitive building blocks. They generalize to “first true” style searches and reduce off-by-one bugs.
Implementation or Practical Context
Choosing a strategy
-
Tiny
n(⇐ 32–64): branch overhead dominates; linear scan may beat binary search. -
Sorted, random access, comparator cheap: use binary search variants.
-
Hashable, order not needed: use hash table.
-
Unknown bounds / streams: exponential search + binary search.
-
Rotated arrays / duplicates: modified binary search with invariants and careful equality handling.
-
Answer search: monotone predicate + binary search on parameter space.
Robust binary search pattern (half-open)
Adopt a single, tested pattern in the codebase. Prefer half-open [lo, hi) loops with:
-
Update
lo = mid + 1only when the mid cannot be the answer. -
Update
hi = midwhen mid could still be the answer (lower-bound style).
Comparators and stability
-
Ensure comparator is transitive and consistent. Violations can cause infinite loops.
-
For custom orders (e.g., case-insensitive), normalize inputs or guard comparator with preprocessing.
Numeric safety and overflow
-
Compute mid as
lo + (hi - lo) // 2to avoid overflow on large indices. -
When binary-searching floating ranges, stop on precision (
hi - lo < epsilon) or iteration cap; beware of NaN.
Duplicates and boundaries
-
Membership checks can be ambiguous with duplicates. Use
LOWER_BOUND/UPPER_BOUNDto find the correct edge. -
When returning the “closest” element, decide ties explicitly (left/ceil vs right/floor).
Cache behavior and branch prediction
-
Linear scans are streaming-friendly and can beat binary search on small arrays due to branch misprediction and cache line effects.
-
Galloping (exponential) probes are used in TimSort’s merge to exploit runs.
API design notes
Expose both:
-
Index-based searches (for arrays/vectors).
-
Iterator-based generic versions (C++ style) requiring random-access for logarithmic time.
Tip
For production code, write a single tested utility (e.g.,
lower_bound) and reuse it everywhere to avoid bespoke off-by-one mistakes.
Common Misunderstandings
Warning
Midpoint overflow.
mid = (lo + hi) // 2can overflow 32-bit integers; always computelo + (hi - lo) // 2.
Warning
Wrong loop condition. Mixing
[lo, hi]and[lo, hi)semantics leads to infinite loops. Pick one convention and stick to it.
Warning
Duplicates mishandled. A ”==” success return may miss the first/last occurrence. Use
LOWER_BOUND/UPPER_BOUNDwhen order matters.
Warning
Comparator inconsistency. State-dependent or non-transitive comparators (e.g., locale-dependent toggles) break binary search assumptions.
Warning
Assuming rotation trivial with duplicates. In rotated arrays with many equal elements, the usual “which side is sorted?” test degrades; worst-case can approach linear.
Summary
Search strategy follows structure. Without structure, scan. With order and random access, binary search and its lower/upper bound variants provide predictable Theta(log n) performance and clean semantics for first/last occurrences and insertion points. When bounds are unknown, use exponential search before a binary refinement. For specialized shapes—rotated arrays, unimodal functions, bitonic arrays—adapt the invariant or use problem-specific searches. Avoid classic bugs by committing to a half-open range convention, guarding against overflow, and handling duplicates explicitly.