dsa-prep/foundations.md

00 — Foundations: Big O, Complexity, Python & C++ Idioms

1. Big O — what it really measures

~7 min read·updated 5/29/2026

00 — Foundations: Big O, Complexity, Python & C++ Idioms

Why this comes first: every problem you solve gets evaluated on time and space. If you can't reason about Big O fluently, you'll either over-engineer (and run out of time) or miss the optimal solution.

Table of Contents

  1. Big O — what it really measures
  2. The complexity classes you must know
  3. Common operation costs (memorize this table)
  4. Amortized analysis (the dynamic-array trick)
  5. Master theorem in 60 seconds
  6. Recursion → complexity (recursion tree)
  7. Space complexity — call stack and auxiliary structures
  8. Python idioms cheatsheet
  9. C++ STL cheatsheet
  10. Common bug sources & gotchas

1. Big O — what it really measures

Big O describes how runtime (or memory) grows as input size n grows. It is an upper bound that hides constants and lower-order terms.

T(n) = 3n² + 100n + 5000      →     O(n²)
                                     ↑
                    (only the highest-order term matters as n → ∞)

Why "drop constants"?

Because hardware varies. Two CPUs differ in clock speed by 2-3×, but vs n log n differs by thousands of × at n=10⁶. Big O measures the algorithm, not the machine.

Visualizing growth

n           O(1)    O(log n)   O(n)        O(n log n)   O(n²)        O(2ⁿ)
10          1       3          10          33           100          1,024
100         1       7          100         664          10,000       1.27e30
1,000       1       10         1,000       9,966        1,000,000    too big
1,000,000   1       20         1,000,000   2e7          1e12         too big

Practical rule of thumb (for ~1 second on modern CPU):

  • n ≤ 10O(n!) ok (permutations)
  • n ≤ 20-25O(2ⁿ) ok (subsets, bitmask DP)
  • n ≤ 500O(n³) ok (Floyd-Warshall, some DP)
  • n ≤ 5,000O(n²) ok (DP, two nested loops)
  • n ≤ 10⁶O(n log n) ok (sort, heap, segment tree)
  • n ≤ 10⁸O(n) ok (linear scan)
  • n > 10⁸ → need O(log n) or O(1) per query

This rule is approximate but lets you estimate "is my approach fast enough?" in seconds.


2. The complexity classes you must know

O(1)          constant       array index, hash lookup
O(log n)      logarithmic    binary search, balanced BST op
O(√n)         sqrt           prime check, sqrt decomposition
O(n)          linear         single pass over array
O(n log n)    linearithmic   sort, heap-based algorithms
O(n²)         quadratic      nested loops over input
O(n³)         cubic          triple nested, Floyd-Warshall
O(2ⁿ)         exponential    enumerate all subsets, naive recursion
O(n!)         factorial      enumerate all permutations

When you see what

  • log n → halving each step (binary search, balanced trees, divide-conquer with single recursion)
  • n log n → divide into halves and process each (merge sort, sort + linear scan)
  • → all pairs (i, j) — sometimes hidden in repeated string concatenation in Python (!)
  • 2ⁿ → include/exclude each element (subsets); decision tree with branching factor 2
  • n! → choose any order (permutations); decision tree where branching shrinks

3. Common operation costs

Memorize. You will be asked.

Python

StructureOperationAvgWorstNotes
listlst[i], lst[i] = x, len(lst)O(1)O(1)
listlst.append(x)O(1)*O(n)amortized; resize on grow
listlst.pop() (end)O(1)O(1)
listlst.pop(0), lst.insert(0, x)O(n)O(n)shifts everything
listx in lstO(n)O(n)linear scan
listlst.sort(), sorted(lst)O(n log n)O(n log n)Timsort
dictd[k], d[k] = v, k in dO(1)O(n)hash collisions are worst case
sets.add(x), x in sO(1)O(n)same as dict
collections.dequeappend, appendleft, pop, popleftO(1)O(1)use this not list for queues
heapq (list)heappush, heappopO(log n)O(log n)min-heap only
strs[i]O(1)O(1)
strs + tO(n+m)O(n+m)strings are immutable!
str''.join(list)O(total)O(total)preferred over += in loop
bisectbisect_left, bisect_rightO(log n)O(log n)requires sorted list

⚠️ lst.pop(0) trap: O(n) because it shifts everything. Use collections.deque for FIFO queues.

⚠️ String concatenation trap: s += c in a loop is O(n²) because strings are immutable — every += copies. Always build a list and ''.join(...) at the end, or use io.StringIO.

C++

ContainerOperationComplexity
vector<T>[i], back(), push_back, pop_backO(1)*
vector<T>insert(begin, x), erase(it) midO(n)
vector<T>sort(v.begin(), v.end())O(n log n) — introsort
array<T,N>[i]O(1) (fixed size, stack)
deque<T>push_front, push_back, pop_front, pop_backO(1)
list<T>linked list, spliceO(1) for splice
unordered_map<K,V>[k], find, insertO(1) avg, O(n) worst
unordered_set<T>find, insertO(1) avg, O(n) worst
map<K,V>[k], find, insert (red-black tree)O(log n)
set<T>find, insert, lower_boundO(log n)
priority_queue<T>push, pop, topO(log n) push/pop, O(1) top — max-heap by default
strings[i], s += cO(1), O(1)* (mutable, unlike Python!)

💡 C++ vs Python heap: C++ priority_queue is a max-heap. Python heapq is min-heap-only. To get max-heap in Python: push -x, pop -x. To get min-heap in C++: priority_queue<T, vector<T>, greater<T>>.


4. Amortized analysis

A vector/list push_back/append is sometimes O(n) (when it has to allocate a bigger array and copy). But across n operations, total cost is O(n), so per-operation is amortized O(1) — written O(1)*.

Why?

Doubling strategy: when full, allocate 2× space, copy n elements (cost n), continue. Sum across n inserts:

1 + 2 + 4 + 8 + ... + n  =  2n - 1  =  O(n) total
                                       ↓
                                  O(1) per op

This is why you can treat append as O(1) in interviews — but state "amortized" if pressed.


5. Master theorem

Used to find complexity of divide-and-conquer recurrences of the form:

T(n) = a · T(n/b) + f(n)

Where:

  • a = number of subproblems
  • n/b = subproblem size
  • f(n) = work done outside recursion (combining)

Three cases

Let c = log_b(a). Then:

CaseConditionResult
1f(n) = O(n^(c-ε)) (work small)T(n) = Θ(n^c)
2f(n) = Θ(n^c) (work matches)T(n) = Θ(n^c · log n)
3f(n) = Ω(n^(c+ε)) (work big)T(n) = Θ(f(n))

Examples

Merge sort:        T(n) = 2T(n/2) + O(n)        c=log₂2=1, f=n¹  → Case 2 → O(n log n)
Binary search:     T(n) = T(n/2) + O(1)          c=log₂1=0, f=1   → Case 2 → O(log n)
Karatsuba mult:    T(n) = 3T(n/2) + O(n)         c=log₂3≈1.58, f=n¹ → Case 1 → O(n^1.58)
Strassen:          T(n) = 7T(n/2) + O(n²)        c=log₂7≈2.81, f=n² → Case 1 → O(n^2.81)

For interviews: just memorize "merge sort = O(n log n)" and "binary search = O(log n)" — derive others from recursion trees.


6. Recursion → complexity (recursion tree)

Often clearer than master theorem.

Example: Fibonacci naive

def fib(n): if n <= 1: return n return fib(n-1) + fib(n-2)
                   fib(5)
                  /      \
              fib(4)    fib(3)
              /    \    /    \
           fib(3) fib(2) ...
           ...

Each level doubles → O(2ⁿ) calls

Example: Merge sort

                    merge(n)            ← O(n) work
                   /        \
              merge(n/2)  merge(n/2)    ← total O(n) at this level
              /     \       /    \
         merge(n/4) ...                  ← total O(n) at this level
         ...
         (log n levels)

Total: log n levels × O(n) per level = O(n log n)

Mental shortcut

For each recursion, ask:

  1. How deep? (depth of tree)
  2. How wide at each level? (number of nodes per level)
  3. Work per node?

Total = depth × width × work-per-node (sum over levels).


7. Space complexity

Don't forget:

  1. Auxiliary data — extra arrays, sets, etc.
  2. Call stack — depth of recursion (can be O(n) for skewed tree, O(log n) for balanced).
  3. Output space — sometimes excluded (state your assumption).

Examples

def sum_list(lst): total = 0 # O(1) extra for x in lst: total += x return total # Space: O(1) — no auxiliary storage, no recursion
def factorial(n): if n <= 1: return 1 return n * factorial(n-1) # Space: O(n) — each call adds a frame to the stack
def merge_sort(arr): if len(arr) <= 1: return arr mid = len(arr) // 2 L = merge_sort(arr[:mid]) # O(n) auxiliary across recursion R = merge_sort(arr[mid:]) return merge(L, R) # Space: O(n) auxiliary + O(log n) recursion depth = O(n)

Iterative solutions avoid recursion-stack space. Good interview answer: "I can convert this to iterative using my own stack to reduce stack space from O(n) to O(1) extra besides the explicit stack."


8. Python idioms cheatsheet

Loops & comprehensions

# enumerate — index + value for i, x in enumerate(lst): ... # zip — parallel iteration (stops at shortest) for a, b in zip(lst1, lst2): ... # range range(n) # 0..n-1 range(a, b) # a..b-1 range(a, b, step) # a, a+step, a+2*step ... < b range(b-1, -1, -1) # reverse: b-1, b-2, ..., 0 # list comprehension (the Pythonic for-loop) squares = [x*x for x in range(10) if x % 2 == 0] # dict comprehension freq = {c: s.count(c) for c in set(s)} # generator (lazy, no list built) gen = (x*x for x in range(10)) sum(gen) # 285

Common collections

from collections import defaultdict, Counter, deque, OrderedDict # defaultdict — auto-default when key missing graph = defaultdict(list) graph[u].append(v) # never KeyError # Counter — frequency map with helpful methods c = Counter("aabbbccd") # Counter({'b': 3, 'a': 2, 'c': 2, 'd': 1}) c.most_common(2) # [('b', 3), ('a', 2)] # deque — O(1) both ends q = deque([1, 2, 3]) q.appendleft(0) # deque([0, 1, 2, 3]) q.popleft() # 0 # heapq — min-heap (use list as container) import heapq h = [] heapq.heappush(h, 3) heapq.heappush(h, 1) heapq.heappop(h) # 1 # bisect — binary search on sorted list import bisect arr = [1, 3, 4, 4, 6] bisect.bisect_left(arr, 4) # 2 (leftmost position to insert 4) bisect.bisect_right(arr, 4) # 4 (rightmost position)

String operations

s.lower(), s.upper(), s.swapcase() s.strip(), s.lstrip(), s.rstrip() # whitespace by default s.split(' '), s.split() # split() handles multiple whitespace '-'.join(['a', 'b', 'c']) # 'a-b-c' s.replace('a', 'b') s.find('substr') # -1 if not found s.startswith('pre'), s.endswith('suf') s.isdigit(), s.isalpha(), s.isalnum() ord('a') # 97 — char to int chr(97) # 'a' — int to char

Sorting

sorted(lst) # new list lst.sort() # in-place # Custom key sorted(words, key=len) # by length sorted(points, key=lambda p: p[0]) # by first element sorted(items, key=lambda x: (-x[1], x[0])) # by count desc, name asc # functools.cmp_to_key (rare; for tricky comparators) from functools import cmp_to_key sorted(nums, key=cmp_to_key(lambda a, b: int(b+a) - int(a+b)))

Slicing tricks

lst[::-1] # reversed copy lst[::2] # every other element lst[start:stop:step] s = s[::-1] # reverse string

Truthy/falsy gotchas

# Falsy: 0, 0.0, '', [], {}, None, False # Everything else is truthy. if lst: # idiomatic empty check ... # But for None vs empty: if x is None: ... # explicit None check if x is not None: ...

9. C++ STL cheatsheet

Headers you'll need

#include <bits/stdc++.h> // pulls in everything (competitive programming) using namespace std;

Vector (dynamic array)

vector<int> v(n); // n zeros vector<int> v(n, -1); // n copies of -1 vector<vector<int>> grid(rows, vector<int>(cols, 0)); v.push_back(x); v.pop_back(); v.size(); v.empty(); sort(v.begin(), v.end()); sort(v.begin(), v.end(), greater<int>()); // descending reverse(v.begin(), v.end());

Maps and sets

unordered_map<string, int> m; m["key"] = 42; m.count("key"); // 0 or 1 m.find("key"); // iterator for (auto& [k, v] : m) { ... } map<int, int> ordered; // sorted by key, O(log n) ordered.lower_bound(x); // first key >= x ordered.upper_bound(x); // first key > x unordered_set<int> s; s.insert(x); s.count(x);

Stack, queue, deque

stack<int> st; st.push(1); st.top(); st.pop(); queue<int> q; q.push(1); q.front(); q.pop(); deque<int> dq; dq.push_back(1); dq.push_front(0); dq.front(); dq.back();

Priority queue

priority_queue<int> pq; // max-heap priority_queue<int, vector<int>, greater<int>> minpq; // min-heap priority_queue<pair<int,int>, vector<pair<int,int>>, greater<>> minpq2;

Binary search

sort(v.begin(), v.end()); auto it = lower_bound(v.begin(), v.end(), x); // first >= x auto it = upper_bound(v.begin(), v.end(), x); // first > x int idx = it - v.begin(); binary_search(v.begin(), v.end(), x); // returns bool

String

string s = "hello"; s.length(); s.size(); s.substr(start, len); s.find("ll"); // string::npos if not found to_string(42); // "42" stoi("42"); // 42

I/O for speed (CP)

ios_base::sync_with_stdio(false); cin.tie(NULL);

10. Common bug sources & gotchas

1. Integer overflow (C++ mainly)

int a = 100000, b = 100000; int c = a * b; // overflow! a*b = 10^10 > INT_MAX (~2.1e9) long long c = (long long)a * b; // 10^10 fits in long long

Python ints are arbitrary precision — no overflow.

2. Off-by-one in binary search

while lo <= hi: # ← inclusive bounds mid = (lo + hi) // 2 ... lo = mid + 1 # ← always +1 / -1 to make progress hi = mid - 1

Use template (see 05-binary-search.md).

3. Mutating list during iteration

lst = [1, 2, 3, 4] for x in lst: if x % 2 == 0: lst.remove(x) # ← BUG: skips elements # Better: iterate over a copy, or build a new list lst = [x for x in lst if x % 2 != 0]

4. Default mutable arguments

def f(x, acc=[]): # ← BUG: same list shared across calls acc.append(x) return acc f(1) # [1] f(2) # [1, 2] — surprise! # Fix: def f(x, acc=None): if acc is None: acc = [] ...

5. Shallow vs deep copy

import copy a = [[1, 2], [3, 4]] b = a # same list c = a[:] # shallow copy — outer list new, inner lists shared d = a.copy() # same as a[:] e = copy.deepcopy(a) # fully independent a[0][0] = 99 # b: [[99, 2], [3, 4]] — changed # c: [[99, 2], [3, 4]] — changed (inner shared!) # e: [[1, 2], [3, 4]] — independent

6. int / int vs int // int in Python

5 / 2 # 2.5 (float) 5 // 2 # 2 (integer division, rounds toward -infinity) -5 // 2 # -3 (NOT -2! rounds toward -inf)

For C++-style truncation toward zero: int(a/b) (but this can have float issues for very large nums).

7. Negative modulo

-5 % 3 # 1 (Python: result has sign of divisor)
-5 % 3 // -2 (C++: result has sign of dividend) ((-5) % 3 + 3) % 3 // 1 — safe modulo for negative numbers

8. Recursion depth limit (Python)

import sys sys.setrecursionlimit(10**6) # default is 1000 — bump for deep trees

9. max(0, ...) and min(...) with empty iterables

max([]) # ValueError max([], default=0) # 0 — use default

10. Python integer division floors negatively (off-by-one in mid)

When computing mid in binary search:

mid = (lo + hi) // 2 # for ints in any range — Python is fine mid = lo + (hi - lo) // 2 # avoids potential overflow in C/Java/C++

TL;DR ✅

  • Big O hides constants and lower-order terms. Only the dominant term matters as n grows.
  • For interviews, target O(n) or O(n log n) on n ≤ 10⁶. O(n²) is acceptable for n ≤ 10⁴.
  • Memorize the operation cost tables (Section 3). You'll be asked.
  • Python list.pop(0) is O(n), use deque for FIFO.
  • Strings are immutable in Python — never += in a loop, build a list and join.
  • C++ default priority_queue is max-heap; Python default heapq is min-heap.
  • Recursion adds O(depth) stack space. Deep recursion → consider iterative.
  • Sketch the recursion tree for non-master-theorem cases (depth × width × work).

Once these are second nature, every other file in this folder will read fast. Move on to 01-arrays-hashing.md.

// 1 view

main
UTF-8·typescript