Arrays & Hashing
The bread and butter — arrays, strings, hashmaps, frequency counting, prefix sums.
01 — Arrays & Hashing
Neetcode Week 1. Half of all interview problems are dressed-up array/hash problems. Internalizing this category is the single biggest leverage point for the first month.
Table of Contents
- Arrays — the underlying memory
- Strings — what they are and aren't
- Hash tables — the killer DS
- Hash sets vs hash maps
- Frequency counting pattern
- Prefix sums
- Difference arrays (range updates)
- Walked-through problems
- Pitfalls
- Practice list
1. Arrays — the underlying memory
An array is a contiguous block of memory of fixed-size elements indexed from 0.
Index: 0 1 2 3 4 5
┌────┬────┬────┬────┬────┬────┐
Memory: │ 7 │ 3 │ 9 │ 2 │ 4 │ 8 │
└────┴────┴────┴────┴────┴────┘
↑
Base address `p`. Element i is at `p + i * sizeof(T)`.
Because elements are equal-sized and contiguous:
arr[i]is O(1) — just arithmetic on the pointer.- Iteration is cache-friendly (CPU prefetches the next chunk).
- Insert/delete in the middle is O(n) because everything must shift.
Static vs dynamic
| Static array | Dynamic array (vector / list) |
|---|---|
C int arr[10] | C++ vector, Python list, Java ArrayList |
| Fixed size at compile time | Grows at runtime by reallocation |
| No append cost | append is O(1)* amortized (Section 4 of foundations) |
In interviews you almost always use the dynamic version. Just remember append/push_back is amortized constant.
Python list isn't a "real" array
Python list is a dynamic array of pointers to Python objects, not a packed buffer. So:
arr[i]is still O(1), but slightly slower than C arrays.list[int]doesn't pack ints contiguously. For real packed arrays usearray.arrayornumpy.
This rarely matters in interviews — just know it for the "why is Python slow?" question.
2. Strings
A string is conceptually an array of characters, but with critical differences:
Python strings: immutable
s = "hello" s[0] = 'H' # TypeError! s = "H" + s[1:] # creates new string each time — O(n)
This is why building strings in a loop with += is O(n²) — each += copies the whole string. Always:
parts = [] for c in stuff: parts.append(c) result = ''.join(parts) # O(n) total
C++ strings: mutable
string s = "hello"; s[0] = 'H'; // OK: now "Hello" s += 'X'; // amortized O(1), like vector
Common ops
# Python s.upper(), s.lower() s.split(' '), s.split() # split() handles any whitespace '-'.join(['a', 'b']) # 'a-b' s.find('ll') # index, -1 if not found s.count('l') ord('a') - ord('a') # 0 → useful for char-to-bucket-index
// C++ s.length(); s.substr(start, len); s.find("ll"); // string::npos if not found to_string(42); stoi("42"); char c = 'a' + 3; // 'd'
Char-to-index trick (lowercase letters)
freq = [0] * 26 for c in s: freq[ord(c) - ord('a')] += 1
Cleaner than a hashmap for "only lowercase a-z" problems and slightly faster.
3. Hash tables
A hash table stores key-value pairs by:
- Computing
hash(key) % capacityto pick a bucket - Storing the entry in that bucket (chained list or open addressing)
keys buckets (mod capacity = 8)
┌───────┐
"cat" ─┤hash() ├──→ 5 ─→ ┌───┐ ┌─5─┐
│ │ │ 0 │ │"cat":3 → "tom":1│
"dog" ─┤hash() ├──→ 2 ─→ │ 1 │ │ │
│ │ │ 2 │ │"dog":7│
"tom" ─┤hash() ├──→ 5 ─→ │ 3 │ │ │
└───────┘ │ 4 │ │ │
│ 5 │ │ │
│ 6 │ │ │
│ 7 │ │ │
└───┘
Why O(1) average?
If hash function spreads well and load factor (n / capacity) is below ~0.7, each bucket has a tiny constant chain. Lookup, insert, delete are O(1) on average.
Worst case is O(n) when all keys collide into one bucket. In interviews, just say "average O(1), worst case O(n) for hostile inputs / poor hash."
Python dict / set
d = {} d['key'] = 42 d['key'] # 42 'key' in d # True del d['key'] d.get('missing', -1) # -1 (no KeyError) # Iterate for k in d: ... for k, v in d.items(): ... for v in d.values(): ... # defaultdict — auto-default for missing key from collections import defaultdict graph = defaultdict(list) graph[u].append(v) # graph[u] auto-becomes [] if absent
C++ unordered_map / map
unordered_map<string, int> m; m["key"] = 42; m.count("key"); // 0 or 1 m.find("key"); // iterator, m.end() if absent map<int, int> tree; // ordered, O(log n), useful for range queries tree.lower_bound(x); // first >= x tree.upper_bound(x); // first > x
Use
unordered_map(O(1) avg) by default. Only usemap(O(log n) tree-backed) when you need ordering orlower_bound.
4. Hash sets vs hash maps
- Set — only keys, "is this in the collection?"
- Map — key + value, "given the key, what's stored?"
Both are O(1) avg. Choose based on whether you need a value or just membership.
seen = set() seen.add(x) x in seen # O(1) count = {} count[x] = count.get(x, 0) + 1
5. Frequency counting pattern
Trigger: "anagram", "duplicate", "majority element", "unique character", "k most frequent"
The pattern is: build a hashmap of element → count, then scan it.
from collections import Counter # Are two strings anagrams? def is_anagram(s, t): return Counter(s) == Counter(t) # Most common element def majority(nums): c = Counter(nums) return c.most_common(1)[0][0] # Top K frequent def top_k(nums, k): c = Counter(nums) return [item for item, _ in c.most_common(k)] # O(n log k) with heap internally
Bucket sort variant (when k is small relative to n)
def top_k_frequent(nums, k): count = Counter(nums) buckets = [[] for _ in range(len(nums) + 1)] # bucket[i] = elems with freq i for num, cnt in count.items(): buckets[cnt].append(num) result = [] for i in range(len(buckets) - 1, -1, -1): # highest freq first for num in buckets[i]: result.append(num) if len(result) == k: return result
This is O(n) instead of O(n log k) — a clean optimization when frequencies are bounded by n.
6. Prefix sums
Trigger: "sum of subarray", "range sum query", "find subarray with sum = k"
The prefix-sum array P lets you compute any range sum in O(1) after O(n) preprocessing:
arr: [3, 1, 4, 1, 5, 9]
P: [0, 3, 4, 8, 9, 14, 23]
↑
P[0] = 0 (empty prefix)
P[i] = arr[0] + arr[1] + ... + arr[i-1]
sum(arr[l..r]) = P[r+1] - P[l]
↑
sum of arr[2..4] = 4+1+5 = 10
= P[5] - P[2] = 14 - 4 = 10 ✓
Build it
def prefix_sum(arr): P = [0] * (len(arr) + 1) for i, x in enumerate(arr): P[i+1] = P[i] + x return P
Subarray sum = k (Hashmap + prefix sum)
Classic problem: find number of subarrays whose sum is exactly
k.
def subarray_sum_equals_k(nums, k): count = 0 prefix = 0 seen = {0: 1} # prefix value 0 has been seen once (empty prefix) for x in nums: prefix += x # we want to know: is there an earlier prefix p' such that prefix - p' = k? # i.e. p' = prefix - k. Look it up. if prefix - k in seen: count += seen[prefix - k] seen[prefix] = seen.get(prefix, 0) + 1 return count
Line by line:
seen = {0: 1}— there's one empty prefix with sum 0; lets us count subarrays starting at index 0.prefix += x— running prefix sum.if prefix - k in seen— we want subarray sum =k. Sum from any earlier indexj+1..iequalsprefix[i] - prefix[j], so look forprefix - kamong earlier prefixes.count += seen[prefix - k]— add how many such earlier prefixes there were.- Update
seenfor the next iteration.
Time: O(n). Space: O(n).
7. Difference arrays
Trigger: "apply k range updates
[l, r, +x], then query final array"
Naive: each update is O(r-l+1) → O(n·k) total. Difference array: each update is O(1) → O(n+k) total.
arr: [0, 0, 0, 0, 0]
update [1, 3, +5]:
diff[1] += 5
diff[4] -= 5 ← the index AFTER r
diff: [0, 5, 0, 0, -5]
prefix-sum diff: 0, 5, 5, 5, 0
→ arr = [0, 5, 5, 5, 0] ✓
def apply_updates(n, updates): diff = [0] * (n + 1) for l, r, x in updates: diff[l] += x diff[r + 1] -= x # prefix sum to recover the array arr = [0] * n cur = 0 for i in range(n): cur += diff[i] arr[i] = cur return arr
This is also the foundation of segment-tree-like range-update problems.
8. Walked-through problems
Problem 1: Two Sum (Easy)
Given
numsandtarget, return indices of two numbers that sum totarget.
Brute: O(n²) — try all pairs.
Hash: O(n) — for each element, check if target - x is already seen.
def two_sum(nums, target): seen = {} # value → index for i, x in enumerate(nums): complement = target - x if complement in seen: # found earlier return [seen[complement], i] seen[x] = i # remember this index for future
Why this works: by the time we look for complement, we've already inserted every previous element. So if a valid pair exists, when we hit the second element of the pair, the first is in seen.
Pitfall: insert after the lookup, otherwise nums = [3, 3], target = 6 would match 3 to itself.
Problem 2: Group Anagrams (Medium)
Group strings that are anagrams of each other.
Key insight: anagrams share the same sorted characters or the same frequency tuple.
def group_anagrams(strs): groups = defaultdict(list) for s in strs: key = tuple(sorted(s)) # ("a","b","t") for "bat", "tab" groups[key].append(s) return list(groups.values())
Optimization: use a length-26 tuple of counts instead of sorting. O(n·m) instead of O(n·m log m).
def group_anagrams_fast(strs): groups = defaultdict(list) for s in strs: cnt = [0] * 26 for c in s: cnt[ord(c) - ord('a')] += 1 groups[tuple(cnt)].append(s) return list(groups.values())
Problem 3: Longest Consecutive Sequence (Medium)
Given unsorted
nums, find the length of the longest run of consecutive integers. O(n) required.
[100, 4, 200, 1, 3, 2] → 4 (sequence 1,2,3,4)
Naive: sort → O(n log n). Required: O(n).
Hash trick: put everything in a set. For each number, check if it's the start of a run (i.e., n-1 is not in set). If so, count up.
def longest_consecutive(nums): s = set(nums) longest = 0 for n in s: if n - 1 in s: continue # not the start of a run # n is the start; count up cur = n length = 1 while cur + 1 in s: cur += 1 length += 1 longest = max(longest, length) return longest
Why O(n)? Each number is the inner of while only when it's part of a run we entered from its start. Each element is visited at most twice across the whole algorithm. So total work is O(n), not O(n²).
This "only start runs from boundaries" trick generalizes to many problems.
Problem 4: Product of Array Except Self (Medium)
Return
out[i] = product of all nums[j] where j != i. O(n) time, no division.
nums = [1, 2, 3, 4]
out = [24, 12, 8, 6]
Two-pass technique: for each index, the answer is prefix_product * suffix_product.
def product_except_self(nums): n = len(nums) out = [1] * n # Pass 1: out[i] = product of all nums[0..i-1] prefix = 1 for i in range(n): out[i] = prefix prefix *= nums[i] # Pass 2: multiply by product of all nums[i+1..n-1] suffix = 1 for i in range(n - 1, -1, -1): out[i] *= suffix suffix *= nums[i] return out
Trace nums = [1,2,3,4]:
After pass 1: out = [1, 1, 2, 6] (prefix products)
prefix sequence: 1 → 1·1=1 → 1·2=2 → 2·3=6
After pass 2: from right
i=3: out[3] *= 1 → 6; suffix = 1·4 = 4
i=2: out[2] *= 4 → 8; suffix = 4·3 = 12
i=1: out[1] *= 12 → 12; suffix = 12·2 = 24
i=0: out[0] *= 24 → 24
Final: [24, 12, 8, 6] ✓
9. Pitfalls
Hashing custom objects
In Python, only hashable objects can be dict keys. Lists are unhashable; tuples are. So:
key = tuple(sorted(s)) # OK key = sorted(s) # TypeError: unhashable type: 'list'
For C++, custom types in unordered_map need a custom hash<T> specialization — easier to avoid by using pair<int,int>-of-primitives or string.
Iterating and modifying
Don't modify a dict while iterating:
for k in d: if d[k] == 0: del d[k] # RuntimeError: dictionary changed size during iteration # Fix: for k in list(d): # iterate over a snapshot of keys if d[k] == 0: del d[k]
Counter subtraction surprise
Counter("aabb") - Counter("ab") # Counter({'a':1, 'b':1}) — not negative!
Counter's - only keeps positive counts. Use c1.subtract(c2) for pure subtraction.
Pre-sorting cost
If your "hash" approach involves sorting tuples per element (e.g., group anagrams), state the true cost: O(n · m log m) where m is string length. Often you can replace with bucket counts (O(n·m)).
10. Practice list
In Neetcode order — solve in this sequence.
| # | Problem | Concept | Difficulty |
|---|---|---|---|
| 1 | Two Sum | Hash lookup | Easy |
| 217 | Contains Duplicate | Hash set | Easy |
| 242 | Valid Anagram | Frequency count | Easy |
| 49 | Group Anagrams | Hashmap with tuple key | Medium |
| 347 | Top K Frequent Elements | Bucket sort / heap | Medium |
| 271 | Encode and Decode Strings | String design | Medium |
| 238 | Product of Array Except Self | Prefix/suffix arrays | Medium |
| 36 | Valid Sudoku | Hash set of "row=N val=v" | Medium |
| 128 | Longest Consecutive Sequence | Hash set + boundary scan | Medium |
Bonus advanced practice
- 560 — Subarray Sum Equals K (prefix + hash)
- 525 — Contiguous Array (prefix sum with mapping)
- 380 — Insert Delete GetRandom O(1) (hash + dynamic array)
TL;DR ✅
- Arrays are O(1) random access, O(n) insert-mid. Dynamic arrays' append is O(1)*.
- Python strings are immutable — never
+=in a loop. Build a list, then''.join. - Hash tables = O(1) avg, O(n) worst. The single most-used DS.
- Use Counter / defaultdict liberally; they cut 5-10 lines off most problems.
- Frequency counting solves anagram-class problems.
- Prefix sums solve range-sum-class problems in O(1) per query after O(n) preprocessing.
- Hash + prefix sum solves "subarray with sum = k" in O(n).
- For Top K, bucket sort gives O(n) when frequencies are bounded.
Next: 02-two-pointers.md.
External resources
- NeetCode "Arrays & Hashing" playlist: https://www.youtube.com/playlist?list=PLot-Xpze53lfQmTEztbgdp8ALEoydvnRQ
- VisuAlgo hashing animations: https://visualgo.net/en/hashtable
- Python Counter docs: https://docs.python.org/3/library/collections.html#collections.Counter