◇ dsa-prep/foundations.md
00 — Foundations: Big O, Complexity, Python & C++ Idioms
1. Big O — what it really measures
~7 min read·updated 5/29/2026
00 — Foundations: Big O, Complexity, Python & C++ Idioms
Why this comes first: every problem you solve gets evaluated on time and space. If you can't reason about Big O fluently, you'll either over-engineer (and run out of time) or miss the optimal solution.
Table of Contents
- Big O — what it really measures
- The complexity classes you must know
- Common operation costs (memorize this table)
- Amortized analysis (the dynamic-array trick)
- Master theorem in 60 seconds
- Recursion → complexity (recursion tree)
- Space complexity — call stack and auxiliary structures
- Python idioms cheatsheet
- C++ STL cheatsheet
- Common bug sources & gotchas
1. Big O — what it really measures
Big O describes how runtime (or memory) grows as input size n grows. It is an upper bound that hides constants and lower-order terms.
T(n) = 3n² + 100n + 5000 → O(n²)
↑
(only the highest-order term matters as n → ∞)
Why "drop constants"?
Because hardware varies. Two CPUs differ in clock speed by 2-3×, but n² vs n log n differs by thousands of × at n=10⁶. Big O measures the algorithm, not the machine.
Visualizing growth
n O(1) O(log n) O(n) O(n log n) O(n²) O(2ⁿ)
10 1 3 10 33 100 1,024
100 1 7 100 664 10,000 1.27e30
1,000 1 10 1,000 9,966 1,000,000 too big
1,000,000 1 20 1,000,000 2e7 1e12 too big
Practical rule of thumb (for ~1 second on modern CPU):
n ≤ 10→O(n!)ok (permutations)n ≤ 20-25→O(2ⁿ)ok (subsets, bitmask DP)n ≤ 500→O(n³)ok (Floyd-Warshall, some DP)n ≤ 5,000→O(n²)ok (DP, two nested loops)n ≤ 10⁶→O(n log n)ok (sort, heap, segment tree)n ≤ 10⁸→O(n)ok (linear scan)n > 10⁸→ needO(log n)orO(1)per query
This rule is approximate but lets you estimate "is my approach fast enough?" in seconds.
2. The complexity classes you must know
O(1) constant array index, hash lookup
O(log n) logarithmic binary search, balanced BST op
O(√n) sqrt prime check, sqrt decomposition
O(n) linear single pass over array
O(n log n) linearithmic sort, heap-based algorithms
O(n²) quadratic nested loops over input
O(n³) cubic triple nested, Floyd-Warshall
O(2ⁿ) exponential enumerate all subsets, naive recursion
O(n!) factorial enumerate all permutations
When you see what
log n→ halving each step (binary search, balanced trees, divide-conquer with single recursion)n log n→ divide into halves and process each (merge sort, sort + linear scan)n²→ all pairs (i, j) — sometimes hidden in repeated string concatenation in Python (!)2ⁿ→ include/exclude each element (subsets); decision tree with branching factor 2n!→ choose any order (permutations); decision tree where branching shrinks
3. Common operation costs
Memorize. You will be asked.
Python
| Structure | Operation | Avg | Worst | Notes |
|---|---|---|---|---|
list | lst[i], lst[i] = x, len(lst) | O(1) | O(1) | |
list | lst.append(x) | O(1)* | O(n) | amortized; resize on grow |
list | lst.pop() (end) | O(1) | O(1) | |
list | lst.pop(0), lst.insert(0, x) | O(n) | O(n) | shifts everything |
list | x in lst | O(n) | O(n) | linear scan |
list | lst.sort(), sorted(lst) | O(n log n) | O(n log n) | Timsort |
dict | d[k], d[k] = v, k in d | O(1) | O(n) | hash collisions are worst case |
set | s.add(x), x in s | O(1) | O(n) | same as dict |
collections.deque | append, appendleft, pop, popleft | O(1) | O(1) | use this not list for queues |
heapq (list) | heappush, heappop | O(log n) | O(log n) | min-heap only |
str | s[i] | O(1) | O(1) | |
str | s + t | O(n+m) | O(n+m) | strings are immutable! |
str | ''.join(list) | O(total) | O(total) | preferred over += in loop |
bisect | bisect_left, bisect_right | O(log n) | O(log n) | requires sorted list |
⚠️
lst.pop(0)trap:O(n)because it shifts everything. Usecollections.dequefor FIFO queues.⚠️ String concatenation trap:
s += cin a loop isO(n²)because strings are immutable — every += copies. Always build alistand''.join(...)at the end, or useio.StringIO.
C++
| Container | Operation | Complexity |
|---|---|---|
vector<T> | [i], back(), push_back, pop_back | O(1)* |
vector<T> | insert(begin, x), erase(it) mid | O(n) |
vector<T> | sort(v.begin(), v.end()) | O(n log n) — introsort |
array<T,N> | [i] | O(1) (fixed size, stack) |
deque<T> | push_front, push_back, pop_front, pop_back | O(1) |
list<T> | linked list, splice | O(1) for splice |
unordered_map<K,V> | [k], find, insert | O(1) avg, O(n) worst |
unordered_set<T> | find, insert | O(1) avg, O(n) worst |
map<K,V> | [k], find, insert (red-black tree) | O(log n) |
set<T> | find, insert, lower_bound | O(log n) |
priority_queue<T> | push, pop, top | O(log n) push/pop, O(1) top — max-heap by default |
string | s[i], s += c | O(1), O(1)* (mutable, unlike Python!) |
💡 C++ vs Python heap: C++
priority_queueis a max-heap. Pythonheapqis min-heap-only. To get max-heap in Python: push-x, pop-x. To get min-heap in C++:priority_queue<T, vector<T>, greater<T>>.
4. Amortized analysis
A vector/list push_back/append is sometimes O(n) (when it has to allocate a bigger array and copy). But across n operations, total cost is O(n), so per-operation is amortized O(1) — written O(1)*.
Why?
Doubling strategy: when full, allocate 2× space, copy n elements (cost n), continue. Sum across n inserts:
1 + 2 + 4 + 8 + ... + n = 2n - 1 = O(n) total
↓
O(1) per op
This is why you can treat append as O(1) in interviews — but state "amortized" if pressed.
5. Master theorem
Used to find complexity of divide-and-conquer recurrences of the form:
T(n) = a · T(n/b) + f(n)
Where:
a= number of subproblemsn/b= subproblem sizef(n)= work done outside recursion (combining)
Three cases
Let c = log_b(a). Then:
| Case | Condition | Result |
|---|---|---|
| 1 | f(n) = O(n^(c-ε)) (work small) | T(n) = Θ(n^c) |
| 2 | f(n) = Θ(n^c) (work matches) | T(n) = Θ(n^c · log n) |
| 3 | f(n) = Ω(n^(c+ε)) (work big) | T(n) = Θ(f(n)) |
Examples
Merge sort: T(n) = 2T(n/2) + O(n) c=log₂2=1, f=n¹ → Case 2 → O(n log n)
Binary search: T(n) = T(n/2) + O(1) c=log₂1=0, f=1 → Case 2 → O(log n)
Karatsuba mult: T(n) = 3T(n/2) + O(n) c=log₂3≈1.58, f=n¹ → Case 1 → O(n^1.58)
Strassen: T(n) = 7T(n/2) + O(n²) c=log₂7≈2.81, f=n² → Case 1 → O(n^2.81)
For interviews: just memorize "merge sort = O(n log n)" and "binary search = O(log n)" — derive others from recursion trees.
6. Recursion → complexity (recursion tree)
Often clearer than master theorem.
Example: Fibonacci naive
def fib(n): if n <= 1: return n return fib(n-1) + fib(n-2)
fib(5)
/ \
fib(4) fib(3)
/ \ / \
fib(3) fib(2) ...
...
Each level doubles → O(2ⁿ) calls
Example: Merge sort
merge(n) ← O(n) work
/ \
merge(n/2) merge(n/2) ← total O(n) at this level
/ \ / \
merge(n/4) ... ← total O(n) at this level
...
(log n levels)
Total: log n levels × O(n) per level = O(n log n)
Mental shortcut
For each recursion, ask:
- How deep? (depth of tree)
- How wide at each level? (number of nodes per level)
- Work per node?
Total = depth × width × work-per-node (sum over levels).
7. Space complexity
Don't forget:
- Auxiliary data — extra arrays, sets, etc.
- Call stack — depth of recursion (can be O(n) for skewed tree, O(log n) for balanced).
- Output space — sometimes excluded (state your assumption).
Examples
def sum_list(lst): total = 0 # O(1) extra for x in lst: total += x return total # Space: O(1) — no auxiliary storage, no recursion
def factorial(n): if n <= 1: return 1 return n * factorial(n-1) # Space: O(n) — each call adds a frame to the stack
def merge_sort(arr): if len(arr) <= 1: return arr mid = len(arr) // 2 L = merge_sort(arr[:mid]) # O(n) auxiliary across recursion R = merge_sort(arr[mid:]) return merge(L, R) # Space: O(n) auxiliary + O(log n) recursion depth = O(n)
Iterative solutions avoid recursion-stack space. Good interview answer: "I can convert this to iterative using my own stack to reduce stack space from O(n) to O(1) extra besides the explicit stack."
8. Python idioms cheatsheet
Loops & comprehensions
# enumerate — index + value for i, x in enumerate(lst): ... # zip — parallel iteration (stops at shortest) for a, b in zip(lst1, lst2): ... # range range(n) # 0..n-1 range(a, b) # a..b-1 range(a, b, step) # a, a+step, a+2*step ... < b range(b-1, -1, -1) # reverse: b-1, b-2, ..., 0 # list comprehension (the Pythonic for-loop) squares = [x*x for x in range(10) if x % 2 == 0] # dict comprehension freq = {c: s.count(c) for c in set(s)} # generator (lazy, no list built) gen = (x*x for x in range(10)) sum(gen) # 285
Common collections
from collections import defaultdict, Counter, deque, OrderedDict # defaultdict — auto-default when key missing graph = defaultdict(list) graph[u].append(v) # never KeyError # Counter — frequency map with helpful methods c = Counter("aabbbccd") # Counter({'b': 3, 'a': 2, 'c': 2, 'd': 1}) c.most_common(2) # [('b', 3), ('a', 2)] # deque — O(1) both ends q = deque([1, 2, 3]) q.appendleft(0) # deque([0, 1, 2, 3]) q.popleft() # 0 # heapq — min-heap (use list as container) import heapq h = [] heapq.heappush(h, 3) heapq.heappush(h, 1) heapq.heappop(h) # 1 # bisect — binary search on sorted list import bisect arr = [1, 3, 4, 4, 6] bisect.bisect_left(arr, 4) # 2 (leftmost position to insert 4) bisect.bisect_right(arr, 4) # 4 (rightmost position)
String operations
s.lower(), s.upper(), s.swapcase() s.strip(), s.lstrip(), s.rstrip() # whitespace by default s.split(' '), s.split() # split() handles multiple whitespace '-'.join(['a', 'b', 'c']) # 'a-b-c' s.replace('a', 'b') s.find('substr') # -1 if not found s.startswith('pre'), s.endswith('suf') s.isdigit(), s.isalpha(), s.isalnum() ord('a') # 97 — char to int chr(97) # 'a' — int to char
Sorting
sorted(lst) # new list lst.sort() # in-place # Custom key sorted(words, key=len) # by length sorted(points, key=lambda p: p[0]) # by first element sorted(items, key=lambda x: (-x[1], x[0])) # by count desc, name asc # functools.cmp_to_key (rare; for tricky comparators) from functools import cmp_to_key sorted(nums, key=cmp_to_key(lambda a, b: int(b+a) - int(a+b)))
Slicing tricks
lst[::-1] # reversed copy lst[::2] # every other element lst[start:stop:step] s = s[::-1] # reverse string
Truthy/falsy gotchas
# Falsy: 0, 0.0, '', [], {}, None, False # Everything else is truthy. if lst: # idiomatic empty check ... # But for None vs empty: if x is None: ... # explicit None check if x is not None: ...
9. C++ STL cheatsheet
Headers you'll need
#include <bits/stdc++.h> // pulls in everything (competitive programming) using namespace std;
Vector (dynamic array)
vector<int> v(n); // n zeros vector<int> v(n, -1); // n copies of -1 vector<vector<int>> grid(rows, vector<int>(cols, 0)); v.push_back(x); v.pop_back(); v.size(); v.empty(); sort(v.begin(), v.end()); sort(v.begin(), v.end(), greater<int>()); // descending reverse(v.begin(), v.end());
Maps and sets
unordered_map<string, int> m; m["key"] = 42; m.count("key"); // 0 or 1 m.find("key"); // iterator for (auto& [k, v] : m) { ... } map<int, int> ordered; // sorted by key, O(log n) ordered.lower_bound(x); // first key >= x ordered.upper_bound(x); // first key > x unordered_set<int> s; s.insert(x); s.count(x);
Stack, queue, deque
stack<int> st; st.push(1); st.top(); st.pop(); queue<int> q; q.push(1); q.front(); q.pop(); deque<int> dq; dq.push_back(1); dq.push_front(0); dq.front(); dq.back();
Priority queue
priority_queue<int> pq; // max-heap priority_queue<int, vector<int>, greater<int>> minpq; // min-heap priority_queue<pair<int,int>, vector<pair<int,int>>, greater<>> minpq2;
Binary search
sort(v.begin(), v.end()); auto it = lower_bound(v.begin(), v.end(), x); // first >= x auto it = upper_bound(v.begin(), v.end(), x); // first > x int idx = it - v.begin(); binary_search(v.begin(), v.end(), x); // returns bool
String
string s = "hello"; s.length(); s.size(); s.substr(start, len); s.find("ll"); // string::npos if not found to_string(42); // "42" stoi("42"); // 42
I/O for speed (CP)
ios_base::sync_with_stdio(false); cin.tie(NULL);
10. Common bug sources & gotchas
1. Integer overflow (C++ mainly)
int a = 100000, b = 100000; int c = a * b; // overflow! a*b = 10^10 > INT_MAX (~2.1e9) long long c = (long long)a * b; // 10^10 fits in long long
Python ints are arbitrary precision — no overflow.
2. Off-by-one in binary search
while lo <= hi: # ← inclusive bounds mid = (lo + hi) // 2 ... lo = mid + 1 # ← always +1 / -1 to make progress hi = mid - 1
Use template (see 05-binary-search.md).
3. Mutating list during iteration
lst = [1, 2, 3, 4] for x in lst: if x % 2 == 0: lst.remove(x) # ← BUG: skips elements # Better: iterate over a copy, or build a new list lst = [x for x in lst if x % 2 != 0]
4. Default mutable arguments
def f(x, acc=[]): # ← BUG: same list shared across calls acc.append(x) return acc f(1) # [1] f(2) # [1, 2] — surprise! # Fix: def f(x, acc=None): if acc is None: acc = [] ...
5. Shallow vs deep copy
import copy a = [[1, 2], [3, 4]] b = a # same list c = a[:] # shallow copy — outer list new, inner lists shared d = a.copy() # same as a[:] e = copy.deepcopy(a) # fully independent a[0][0] = 99 # b: [[99, 2], [3, 4]] — changed # c: [[99, 2], [3, 4]] — changed (inner shared!) # e: [[1, 2], [3, 4]] — independent
6. int / int vs int // int in Python
5 / 2 # 2.5 (float) 5 // 2 # 2 (integer division, rounds toward -infinity) -5 // 2 # -3 (NOT -2! rounds toward -inf)
For C++-style truncation toward zero: int(a/b) (but this can have float issues for very large nums).
7. Negative modulo
-5 % 3 # 1 (Python: result has sign of divisor)
-5 % 3 // -2 (C++: result has sign of dividend) ((-5) % 3 + 3) % 3 // 1 — safe modulo for negative numbers
8. Recursion depth limit (Python)
import sys sys.setrecursionlimit(10**6) # default is 1000 — bump for deep trees
9. max(0, ...) and min(...) with empty iterables
max([]) # ValueError max([], default=0) # 0 — use default
10. Python integer division floors negatively (off-by-one in mid)
When computing mid in binary search:
mid = (lo + hi) // 2 # for ints in any range — Python is fine mid = lo + (hi - lo) // 2 # avoids potential overflow in C/Java/C++
TL;DR ✅
- Big O hides constants and lower-order terms. Only the dominant term matters as
ngrows. - For interviews, target O(n) or O(n log n) on n ≤ 10⁶. O(n²) is acceptable for n ≤ 10⁴.
- Memorize the operation cost tables (Section 3). You'll be asked.
- Python
list.pop(0)is O(n), usedequefor FIFO. - Strings are immutable in Python — never
+=in a loop, build a list and join. - C++ default
priority_queueis max-heap; Python defaultheapqis min-heap. - Recursion adds O(depth) stack space. Deep recursion → consider iterative.
- Sketch the recursion tree for non-master-theorem cases (depth × width × work).
Once these are second nature, every other file in this folder will read fast. Move on to 01-arrays-hashing.md.
// 1 view