dsa-prep/heaps.md

09 — Heaps & Priority Queues

1. Heap conceptually

~7 min read·updated 5/29/2026

09 — Heaps & Priority Queues

Neetcode Week 8a. A heap is a partially ordered tree that gives you the minimum (or maximum) in O(log n) time. The two big patterns: Top K and Two Heaps for the Median.

Table of Contents

  1. Heap conceptually
  2. Array-backed heap (the actual implementation)
  3. Operations and complexity
  4. Python heapq
  5. C++ priority_queue
  6. Pattern A — Top K Elements
  7. Pattern B — Two Heaps for Streaming Median
  8. Pattern C — K-way Merge
  9. Pattern D — Scheduling / Greedy with priority
  10. Pitfalls
  11. Practice list

1. Heap conceptually

A min-heap is a binary tree where every parent is its children. The minimum is always at the root.

            1            ← min, always at root
          /   \
         3     2
        / \   / \
       5   4 6   7

Constraints:

  • Heap order: parent ≤ children (min-heap) or parent ≥ children (max-heap).
  • Shape: complete binary tree (every level filled left-to-right except possibly the last).

The shape constraint means we can store the heap in an array with no pointers.

A max-heap is the same but with parent ≥ children. Top is the maximum.


2. Array-backed heap

For node at index i in a 0-indexed array:

  • left child: 2i + 1
  • right child: 2i + 2
  • parent: (i - 1) // 2
Index:    0    1    2    3    4    5    6
        ┌────┬────┬────┬────┬────┬────┬────┐
Heap:   │ 1  │ 3  │ 2  │ 5  │ 4  │ 6  │ 7  │
        └────┴────┴────┴────┴────┴────┴────┘

Visualizes as:
            1 (idx 0)
           /         \
        3 (1)       2 (2)
        / \         / \
      5(3) 4(4)  6(5)  7(6)

Push (sift up)

  1. Append at the end of the array.
  2. While new element < its parent, swap with parent.

Pop (sift down)

  1. Swap root with last element. Pop last (the old root, returned).
  2. Starting at root, while it's > smaller child, swap with the smaller child.

Both operations are O(log n) because the tree has height log n.


3. Operations and complexity

OpTimeDescription
peek()O(1)Return root
push(x)O(log n)Insert + sift up
pop()O(log n)Remove root + sift down
heapify(arr)O(n)Build heap from array
replace(x)O(log n)Pop and push (a single sift-down)

⚠️ Building a heap from n elements is O(n), not O(n log n). The sift-down approach (Floyd's algorithm) only does total work proportional to n. Don't say O(n log n) for heapify in interviews.

OpWhy O(n) for heapify?
Naive: push n timesEach push O(log n), total O(n log n).
Floyd: from index n/2-1 down to 0, sift downMost nodes are near the bottom and only sift a few levels. Math gives sum ≤ 2n = O(n).

4. Python heapq

Python's heap is a min-heap stored in a list. It mutates the list in place.

import heapq h = [] heapq.heappush(h, 3) heapq.heappush(h, 1) heapq.heappush(h, 5) heapq.heappop(h) # 1 h[0] # peek minimum (just index 0) # heapify in O(n) arr = [5, 1, 3, 8, 2] heapq.heapify(arr) # arr is now [1, 2, 3, 8, 5] — valid min-heap # heappushpop — push and pop in one go (faster than two ops) heapq.heappushpop(h, x) # heapreplace — pop then push (errors if heap empty) heapq.heapreplace(h, x) # nlargest / nsmallest — for static arrays, when k is small heapq.nlargest(3, [4, 1, 7, 3, 8, 2]) # [8, 7, 4] heapq.nsmallest(3, [4, 1, 7, 3, 8, 2]) # [1, 2, 3]

Max-heap in Python (the workaround)

# Negate values heapq.heappush(h, -x) top_max = -heapq.heappop(h)

Heap of tuples (multi-key sort)

heapq.heappush(h, (priority, secondary, item)) # ordered by priority, then secondary

Pitfall: if the items themselves aren't comparable, ties on priority will try to compare item and crash. Add an incrementing counter as a tiebreaker:

counter = 0 heapq.heappush(h, (priority, counter, item)) counter += 1

This pattern shows up constantly.


5. C++ priority_queue

C++'s default is a max-heap:

priority_queue<int> pq; pq.push(3); pq.push(1); pq.top(); // 3 (max) pq.pop();

For a min-heap:

priority_queue<int, vector<int>, greater<int>> pq;

For pairs / custom objects, provide a comparator:

auto cmp = [](const pair<int,int>& a, const pair<int,int>& b) { return a.first > b.first; // min-heap by first }; priority_queue<pair<int,int>, vector<pair<int,int>>, decltype(cmp)> pq(cmp);

Difference summary

Python heapqC++ priority_queue
DefaultMin-heapMax-heap
Negate trick for oppositeyesno — use greater<T>
Heapify in placeheapq.heapify(list)make_heap(v.begin(), v.end())
Pop returns valueheappop(h) returnspq.pop() is void; use pq.top() first

6. Pattern A — Top K Elements

Trigger: "k largest / smallest", "k closest to origin", "top k frequent."

Approach 1 — Heap of size k

For "top K largest," maintain a min-heap of size k. The min of the heap is the k-th largest seen so far. When a new element is bigger, push it and pop the min.

import heapq def top_k_largest(nums, k): h = [] # min-heap for x in nums: if len(h) < k: heapq.heappush(h, x) elif x > h[0]: heapq.heapreplace(h, x) # pop and push in one return list(h)

Complexity: O(n log k). Space: O(k).

For "top K smallest," use a max-heap of size k (negate values in Python).

Approach 2 — Heapify and pop k times

def top_k_smallest(nums, k): heapq.heapify(nums) # O(n) return [heapq.heappop(nums) for _ in range(k)] # O(k log n)

Complexity: O(n + k log n). Better when k ≈ n (since heap-of-size-k = O(n log k) ≈ O(n log n)).

Approach 3 — Quickselect (average O(n))

Find the k-th element with quickselect (average O(n)) and then partition. Worth knowing for a "best you can do" answer; details in §7 of foundations. Often skipped in favor of heap solution because it's robust and easier to write.

Top K Frequent Elements

from collections import Counter import heapq def top_k_frequent(nums, k): count = Counter(nums) return heapq.nlargest(k, count.keys(), key=count.get)

Or bucket sort for O(n) — see 01-arrays-hashing.md.

K Closest Points to Origin

import heapq def k_closest(points, k): return heapq.nsmallest(k, points, key=lambda p: p[0]**2 + p[1]**2)

7. Pattern B — Two heaps

Trigger: "running median," "median of stream," "balance two halves."

Maintain two heaps:

  • small — max-heap of the smaller half
  • large — min-heap of the larger half

Invariants:

  1. Every element in small ≤ every element in large.
  2. len(small) == len(large) or len(small) == len(large) + 1.

Median is:

  • If sizes equal: average of small.top() and large.top().
  • Else: small.top().

Find Median from Data Stream (Hard)

import heapq class MedianFinder: def __init__(self): self.small = [] # max-heap (negate) self.large = [] # min-heap def addNum(self, num): # 1. Push to small (negate for max-heap) heapq.heappush(self.small, -num) # 2. Move max of small to large to maintain ordering heapq.heappush(self.large, -heapq.heappop(self.small)) # 3. Rebalance sizes if len(self.large) > len(self.small): heapq.heappush(self.small, -heapq.heappop(self.large)) def findMedian(self): if len(self.small) > len(self.large): return -self.small[0] return (-self.small[0] + self.large[0]) / 2

Steps 1–2: insert into the wrong heap to maintain ordering, then bubble across. This naturally enforces invariant 1.

Step 3: keep smalllarge in size by 0 or 1.

Complexity: add = O(log n), median = O(1).

Why two heaps?

To get the median in O(1), you need to split the data at the middle. Heaps give you O(1) access to each side's extreme, and O(log n) inserts. Sorted data structures (BST, sorted list) give the same ops but two heaps are simpler.


8. Pattern C — K-way Merge

Trigger: "merge K sorted arrays / lists," "k smallest in matrix."

Push the head of each sorted source into a heap. Pop smallest, append, push next.

Merge K Sorted Lists (Hard) — see also 06-linked-list.md

import heapq def merge_k_lists(lists): h = [] for i, lst in enumerate(lists): if lst: heapq.heappush(h, (lst.val, i, lst)) # i for tiebreak dummy = ListNode() tail = dummy while h: val, i, node = heapq.heappop(h) tail.next = node tail = node if node.next: heapq.heappush(h, (node.next.val, i, node.next)) return dummy.next

Complexity: O(N log k) where N = total elements, k = number of lists.

Kth Smallest in Sorted Matrix (Medium)

import heapq def kth_smallest(matrix, k): n = len(matrix) # Push first column h = [(matrix[r][0], r, 0) for r in range(min(k, n))] heapq.heapify(h) for _ in range(k - 1): val, r, c = heapq.heappop(h) if c + 1 < n: heapq.heappush(h, (matrix[r][c + 1], r, c + 1)) return h[0][0]

Treat each row as a sorted source. K-way merge until we pop k times.

(Better solution exists with binary search on the answer — see 05-binary-search.md.)


9. Pattern D — Scheduling

Trigger: "schedule tasks with cooldown," "process intervals by priority."

Task Scheduler (Medium)

Tasks have types A-Z. Same type must wait n cycles before reusing. Find minimum time.

from collections import Counter, deque import heapq def least_interval(tasks, n): count = Counter(tasks) h = [-c for c in count.values()] # max-heap by count heapq.heapify(h) q = deque() # (count_remaining, ready_time) time = 0 while h or q: time += 1 if h: cnt = heapq.heappop(h) + 1 # one fewer of this task if cnt < 0: # still has remaining q.append((cnt, time + n)) # ready at time + n if q and q[0][1] == time: heapq.heappush(h, q.popleft()[0]) return time

The heap holds tasks ready now; the deque holds tasks waiting for cooldown to expire.

Reorganize String (Medium) — same pattern with chars

Rearrange string so no two adjacent chars are equal.

def reorganize_string(s): count = Counter(s) h = [(-cnt, ch) for ch, cnt in count.items()] heapq.heapify(h) res = [] prev = (0, '') while h: cnt, ch = heapq.heappop(h) res.append(ch) if prev[0] < 0: heapq.heappush(h, prev) prev = (cnt + 1, ch) # one used; reschedule next round return ''.join(res) if len(res) == len(s) else ""

10. Pitfalls

Heap doesn't support efficient delete-by-value

Removing an arbitrary element from a heap is O(n). For "lazy deletion" pattern: keep a separate set of "deleted" items, and skip them when they bubble to the top.

deleted = set() while h and h[0] in deleted: heapq.heappop(h) return h[0]

Forgetting tiebreaker in tuples

heapq.heappush(h, (priority, my_object)) # crashes on tie if my_object isn't comparable

Insert a counter as second element.

Confusing min vs max

Python = min by default. C++ priority_queue = max by default. Always state which.

Negate-trick subtleties

For min-heap-via-negation in Python, remember to negate back when reading:

heapq.heappush(h, -x) top = -h[0] # ← negate back

Heap is not sorted

Iterating a heap doesn't yield sorted order. Heap order is partial — only root is guaranteed minimum/maximum.

Building a heap is O(n)

Don't say "n log n" for heapify. It's O(n) by Floyd's algorithm.

Updating priority in Python

heapq doesn't support decrease-key. Workaround: push the new (priority, item) and ignore stale popped items.

while h and h[0][1] != current_priority[h[0][2]]: heapq.heappop(h)

Dijkstra needs decrease-key conceptually but lazy works

For shortest-path-with-weights problems, the lazy approach (push duplicates, skip stale) is fine and is what most interview implementations look like. See 12-advanced-graphs.md.


11. Practice list

#ProblemPatternDifficulty
703Kth Largest Element in StreamHeap of size kEasy
1046Last Stone WeightMax-heapEasy
973K Closest Points to OriginTop KMedium
215Kth Largest Element in ArrayHeap or QuickSelectMedium
621Task SchedulerHeap + cooldown queueMedium
355Design TwitterHeap-merge per userMedium
295Find Median from Data StreamTwo heapsHard
23Merge K Sorted ListsK-way mergeHard
378Kth Smallest in Sorted MatrixK-way mergeMedium
767Reorganize StringHeap schedulingMedium
347Top K Frequent ElementsHeap or bucket sortMedium

TL;DR ✅

  • A heap = complete binary tree with parent ≤/≥ children. Stored in array (no pointers).
  • peek O(1), push and pop O(log n), heapify O(n).
  • Python heapq is min-only; negate values for max. C++ priority_queue is max by default; pass greater<> for min.
  • Top K = heap of size k. O(n log k).
  • Two heaps = streaming median.
  • K-way merge = heap of source heads.
  • Lazy deletion for "decrease key" — push new, skip stale on pop.
  • Add a tiebreaker when pushing tuples of incomparable objects.

Next: 10-backtracking.md.

External resources

// 1 view

main
UTF-8·typescript