Graphs

BFS, DFS, islands, topological sort, cycle detection.

graphs.mdreadme

11 — Graphs

Neetcode Week 9. Graphs are the most-asked Hard topic at Google. Many problems disguise themselves: a 2D grid is a graph, a Sudoku board is a graph, dependencies are a graph. Once you spot it, you reach for BFS, DFS, or topological sort.

Table of Contents

  1. Graph terminology
  2. Representations
  3. BFS — breadth-first search
  4. DFS — depth-first search
  5. Pattern: number-of-islands family
  6. Pattern: multi-source BFS
  7. Pattern: cycle detection
  8. Topological sort (Kahn + DFS)
  9. Walked-through problems
  10. Pitfalls
  11. Practice list

1. Graph terminology

A graph G = (V, E):

  • V = set of vertices (nodes)
  • E = set of edges (pairs connecting two vertices)
TermMeaning
DirectedEdge has direction (a→b ≠ b→a)
UndirectedEdges go both ways
WeightedEach edge has a number (weight/cost)
Cyclic / acyclicHas at least one cycle / no cycles
DAGDirected Acyclic Graph
Connected (undirected)A path exists between every pair
Strongly connected (directed)Same, in directed sense
TreeConnected, undirected, acyclic, V-1 edges
ForestDisjoint trees
AdjacencyTwo vertices are adjacent if there's an edge between them
DegreeNumber of edges at a vertex (in-degree / out-degree for directed)
Undirected graph:                Directed graph:
    A───B                            A───→B
    │   │                            │    │
    │   │                            ↓    ↓
    C───D                            C───→D

2. Representations

Adjacency list (most common)

For each vertex, store a list of neighbors.

graph = { 'A': ['B', 'C'], 'B': ['A', 'D'], 'C': ['A', 'D'], 'D': ['B', 'C'], }

Or with defaultdict:

from collections import defaultdict graph = defaultdict(list) graph['A'].append('B') graph['B'].append('A') # for undirected, add both directions
OpCost
Find neighbors of vO(degree(v))
Check if (u, v) is edgeO(degree(u))
Total memoryO(V + E)

Adjacency matrix

matrix[i][j] = 1 if edge from i to j, else 0. Memory O(V²) — only good for dense graphs.

n = 4 matrix = [[0]*n for _ in range(n)] matrix[0][1] = 1 # edge from 0 to 1
OpCost
Check if (u, v) is edgeO(1)
Find neighborsO(V)
Total memoryO(V²)

Edge list

Just [(u, v, weight), ...]. Used for Bellman-Ford and Kruskal's MST.

Implicit graphs

Sometimes the graph isn't given — you compute neighbors on the fly:

  • Grid: neighbors of (r, c) are (r±1, c) and (r, c±1).
  • Word ladder: neighbors of "cat" are "bat", "car", etc. (one letter changed).
  • Knight's tour: neighbors are the L-shaped moves.

Use the same BFS/DFS algorithms; just generate neighbors when needed.


3. BFS — breadth-first search

Visit all nodes at distance 1, then all at distance 2, and so on.

Use cases:

  • Shortest path in unweighted graph (or where all edges cost 1).
  • Level-order traversal of a tree.
  • Finding any node satisfying a condition, via shortest steps.

Template

from collections import deque def bfs(start, graph): visited = {start} q = deque([start]) while q: node = q.popleft() # process node for nbr in graph[node]: if nbr not in visited: visited.add(nbr) q.append(nbr)

Critical: mark as visited when adding to the queue, not when popping. Otherwise you'll add the same node multiple times.

Tracking levels (distance)

def shortest_path_unweighted(start, target, graph): if start == target: return 0 visited = {start} q = deque([(start, 0)]) # (node, dist) while q: node, d = q.popleft() for nbr in graph[node]: if nbr == target: return d + 1 if nbr not in visited: visited.add(nbr) q.append((nbr, d + 1)) return -1 # unreachable

Or process layer-by-layer (snapshot trick from 07-trees.md).

Why BFS gives shortest path on unweighted

Each level is "1 more step from the start." The first time we encounter a node, it's via the shortest path. (False if edge weights vary — then use Dijkstra.)


4. DFS — depth-first search

Go as deep as possible, then backtrack.

Use cases:

  • Connected components.
  • Topological sort.
  • Cycle detection.
  • Any "explore all reachable" question where order doesn't matter.

Recursive

def dfs(node, graph, visited): if node in visited: return visited.add(node) # process node for nbr in graph[node]: dfs(nbr, graph, visited)

Iterative (explicit stack)

def dfs_iter(start, graph): visited = {start} stack = [start] while stack: node = stack.pop() # process node for nbr in graph[node]: if nbr not in visited: visited.add(nbr) stack.append(nbr)

Same logic as BFS but with stack instead of queue.

When to use BFS vs DFS

Problem says...Use
"Shortest" something in unweightedBFS
"Reachability" / "connected components"Either (DFS easier to write)
"All paths"DFS (with backtracking)
"Cycle in directed graph"DFS with white/gray/black
"Topological sort"Either (Kahn = BFS, post-order DFS)

5. Pattern: Number of islands family

A 2D grid is a graph where each cell is a vertex, neighbors are 4 adjacent cells.

Number of Islands (Medium)

Count connected components of '1' cells.

def num_islands(grid): rows, cols = len(grid), len(grid[0]) count = 0 def dfs(r, c): if r < 0 or r >= rows or c < 0 or c >= cols or grid[r][c] != '1': return grid[r][c] = '0' # mark visited dfs(r+1, c); dfs(r-1, c); dfs(r, c+1); dfs(r, c-1) for r in range(rows): for c in range(cols): if grid[r][c] == '1': dfs(r, c) count += 1 return count

Classic DFS over a grid. For each unvisited '1', flood the whole island with DFS, increment count.

Max Area of Island (Medium)

Same skeleton; DFS returns the size:

def max_area(grid): rows, cols = len(grid), len(grid[0]) def dfs(r, c): if r < 0 or r >= rows or c < 0 or c >= cols or grid[r][c] != 1: return 0 grid[r][c] = 0 return 1 + dfs(r+1, c) + dfs(r-1, c) + dfs(r, c+1) + dfs(r, c-1) return max((dfs(r, c) for r in range(rows) for c in range(cols)), default=0)

Surrounded Regions (Medium)

Capture all 'O' regions that are not connected to the border.

Reverse approach: from each border 'O', DFS to mark safe cells. After, all unmarked 'O's are surrounded.

def solve(board): if not board: return rows, cols = len(board), len(board[0]) def dfs(r, c): if r < 0 or r >= rows or c < 0 or c >= cols or board[r][c] != 'O': return board[r][c] = 'S' # mark safe dfs(r+1, c); dfs(r-1, c); dfs(r, c+1); dfs(r, c-1) for r in range(rows): dfs(r, 0); dfs(r, cols-1) for c in range(cols): dfs(0, c); dfs(rows-1, c) for r in range(rows): for c in range(cols): board[r][c] = 'X' if board[r][c] == 'O' else ('O' if board[r][c] == 'S' else 'X')

The "trick" is marking what to keep, then converting the rest. Often easier than tracking what to capture.

Pacific Atlantic Water Flow (Medium)

Two BFS/DFS from each ocean's borders. Cell that's reachable from both belongs in the answer. Same "reverse" pattern.


6. Pattern: multi-source BFS

Trigger: "earliest time something spreads," "minimum distance from any of these sources."

Push all sources into the queue at distance 0, then BFS as usual. The first time a cell is visited, that's the closest source.

Rotting Oranges (Medium)

Each rotten orange spreads rot to adjacent fresh oranges per minute. Minimum minutes until all rotten?

from collections import deque def oranges_rotting(grid): rows, cols = len(grid), len(grid[0]) q = deque() fresh = 0 for r in range(rows): for c in range(cols): if grid[r][c] == 2: q.append((r, c, 0)) elif grid[r][c] == 1: fresh += 1 last_time = 0 while q: r, c, t = q.popleft() last_time = t for dr, dc in [(-1,0),(1,0),(0,-1),(0,1)]: nr, nc = r+dr, c+dc if 0 <= nr < rows and 0 <= nc < cols and grid[nr][nc] == 1: grid[nr][nc] = 2 fresh -= 1 q.append((nr, nc, t + 1)) return last_time if fresh == 0 else -1

Walls and Gates (Medium)

Same pattern: all gates pushed at distance 0. Each cell gets the distance to its closest gate.


7. Pattern: cycle detection

In an undirected graph (DFS)

def has_cycle_undirected(graph): visited = set() def dfs(node, parent): visited.add(node) for nbr in graph[node]: if nbr not in visited: if dfs(nbr, node): return True elif nbr != parent: # back edge to non-parent → cycle return True return False return any(dfs(v, None) for v in graph if v not in visited)

The "parent" parameter avoids treating the back edge through which we entered as a cycle.

In a directed graph (DFS with white/gray/black)

def has_cycle_directed(graph): WHITE, GRAY, BLACK = 0, 1, 2 color = {v: WHITE for v in graph} def dfs(v): color[v] = GRAY for u in graph[v]: if color[u] == GRAY: return True # back edge in current DFS path → cycle if color[u] == WHITE and dfs(u): return True color[v] = BLACK return False return any(dfs(v) for v in graph if color[v] == WHITE)

GRAY means "in the current DFS path." Hitting a GRAY node means we've looped back.


8. Topological sort

For DAGs only. Linear ordering where every edge (u, v) has u before v.

Real-world: course prerequisites, build-system dependencies, package install order.

Approach 1: Kahn's algorithm (BFS)

  1. Compute in-degree for every vertex.
  2. Push all vertices with in-degree 0 into the queue.
  3. Pop one. Add to result. Decrement in-degrees of its neighbors. Push any neighbor whose in-degree drops to 0.
  4. Repeat.

If at the end, result has fewer than V vertices → there's a cycle.

from collections import defaultdict, deque def topo_sort_kahn(num_courses, prerequisites): graph = defaultdict(list) in_deg = [0] * num_courses for c, pre in prerequisites: graph[pre].append(c) in_deg[c] += 1 q = deque(v for v in range(num_courses) if in_deg[v] == 0) order = [] while q: v = q.popleft() order.append(v) for nbr in graph[v]: in_deg[nbr] -= 1 if in_deg[nbr] == 0: q.append(nbr) return order if len(order) == num_courses else [] # [] = cycle

Approach 2: DFS with finishing-time (post-order reversed)

def topo_sort_dfs(num_courses, prerequisites): graph = defaultdict(list) for c, pre in prerequisites: graph[pre].append(c) WHITE, GRAY, BLACK = 0, 1, 2 color = [WHITE] * num_courses order = [] def dfs(v): if color[v] == GRAY: return False if color[v] == BLACK: return True color[v] = GRAY for u in graph[v]: if not dfs(u): return False color[v] = BLACK order.append(v) return True for v in range(num_courses): if not dfs(v): return [] # cycle return order[::-1]

When a DFS finishes a node, all its dependents are already processed; recording in finish order then reversing gives a valid topo order.

Course Schedule (Medium) — just cycle check

def can_finish(num_courses, prerequisites): return len(topo_sort_kahn(num_courses, prerequisites)) == num_courses

Course Schedule II (Medium) — return the order

Use Kahn directly, return order if len(order) == num_courses else [].


9. Walked-through problems

Clone Graph (Medium)

Given an undirected graph, deep-copy it.

DFS with memoization (old node → new node).

def clone_graph(node): if not node: return None mapping = {} def dfs(n): if n in mapping: return mapping[n] copy = Node(n.val) mapping[n] = copy for nbr in n.neighbors: copy.neighbors.append(dfs(nbr)) return copy return dfs(node)

Word Ladder (Hard)

Transform beginWord to endWord one letter at a time. Each intermediate must be in wordList. Minimum number of transformations?

Treat words as nodes; two words are neighbors if they differ by one letter. BFS for shortest.

Trick to find neighbors efficiently: for each word, generate all "wildcards" like _at, c_t, ca_ and group words by these wildcards. This way you find one-edit neighbors in O(L) instead of O(N * L) per word.

from collections import defaultdict, deque def ladder_length(beginWord, endWord, wordList): word_set = set(wordList) if endWord not in word_set: return 0 L = len(beginWord) pattern_map = defaultdict(list) for w in word_set: for i in range(L): pattern = w[:i] + '*' + w[i+1:] pattern_map[pattern].append(w) visited = {beginWord} q = deque([(beginWord, 1)]) while q: word, level = q.popleft() for i in range(L): pattern = word[:i] + '*' + word[i+1:] for nbr in pattern_map[pattern]: if nbr == endWord: return level + 1 if nbr not in visited: visited.add(nbr) q.append((nbr, level + 1)) return 0

10. Pitfalls

Marking visited at pop time (BFS)

while q: n = q.popleft() if n in visited: continue # too late! visited.add(n) ...

Same node can be added to the queue multiple times before being popped → wasted work. Mark visited when adding to queue.

Forgetting bidirectional edges (undirected)

graph[u].append(v) # graph[v].append(u) ← MISSING for undirected

Off-by-one in distance

Starting node is at distance 0, not 1. Verify with a tiny test.

Confusing recursion stack and graph stack

In iterative DFS, use a stack of nodes. Don't stack functions/calls.

Trying topo sort on cyclic graph

Topo sort assumes DAG. If there's a cycle, Kahn produces fewer than V nodes (good signal).

Using set() for visited on giant graphs

For grids, [[False] * cols for _ in range(rows)] is faster than a set of tuples. Profile before changing.

Multi-source BFS edge case

If all cells are sources, distance is 0 everywhere — handle that.

Stack overflow on deep DFS

Python's default recursion limit is 1000. Use sys.setrecursionlimit(10**6) or convert to iterative.


11. Practice list

#ProblemPatternDifficulty
200Number of IslandsDFS on gridMedium
133Clone GraphDFS + memoMedium
695Max Area of IslandDFS returning sizeMedium
417Pacific Atlantic Water FlowReverse-BFSMedium
130Surrounded RegionsBorder DFSMedium
994Rotting OrangesMulti-source BFSMedium
286Walls and GatesMulti-source BFSMedium
207Course ScheduleCycle detectionMedium
210Course Schedule IITopo sortMedium
127Word LadderBFS on wildcard-graphHard
261Graph Valid TreeCycle + connectivityMedium
323Connected Components in Undirected GraphDFS / Union-FindMedium

TL;DR ✅

  • Adjacency list is the default representation. defaultdict(list).
  • BFS for shortest path in unweighted graphs and grids.
  • DFS for reachability, connected components, cycle detection.
  • Mark visited when adding to the queue/stack, not when processing.
  • Multi-source BFS = push all sources at distance 0, then BFS.
  • Topological sort: Kahn (in-degree + queue) or DFS post-order. DAG only.
  • Implicit graphs (grids, word ladder) use the same algorithms — just generate neighbors on the fly.

Next: 12-advanced-graphs.md.

External resources

main
UTF-8·typescript