01 — Arrays & Hashing

Neetcode Week 1. Half of all interview problems are dressed-up array/hash problems. Internalizing this category is the single biggest leverage point for the first month.

Arrays — the underlying memory
Strings — what they are and aren't
Hash tables — the killer DS
Hash sets vs hash maps
Frequency counting pattern
Prefix sums
Difference arrays (range updates)
Walked-through problems
Pitfalls
Practice list

1. Arrays — the underlying memory

An array is a contiguous block of memory of fixed-size elements indexed from 0.

Index:    0    1    2    3    4    5
        ┌────┬────┬────┬────┬────┬────┐
Memory: │ 7  │ 3  │ 9  │ 2  │ 4  │ 8  │
        └────┴────┴────┴────┴────┴────┘
        ↑
        Base address `p`. Element i is at `p + i * sizeof(T)`.

Because elements are equal-sized and contiguous:

arr[i] is O(1) — just arithmetic on the pointer.
Iteration is cache-friendly (CPU prefetches the next chunk).
Insert/delete in the middle is O(n) because everything must shift.

Static vs dynamic

Static array	Dynamic array (vector / list)
C `int arr[10]`	C++ `vector`, Python `list`, Java `ArrayList`
Fixed size at compile time	Grows at runtime by reallocation
No append cost	`append` is `O(1)*` amortized (Section 4 of foundations)

In interviews you almost always use the dynamic version. Just remember append/push_back is amortized constant.

Python list isn't a "real" array

Python list is a dynamic array of pointers to Python objects, not a packed buffer. So:

arr[i] is still O(1), but slightly slower than C arrays.
list[int] doesn't pack ints contiguously. For real packed arrays use array.array or numpy.

This rarely matters in interviews — just know it for the "why is Python slow?" question.

2. Strings

A string is conceptually an array of characters, but with critical differences:

Python strings: immutable

s = "hello"
s[0] = 'H'      # TypeError!
s = "H" + s[1:] # creates new string each time — O(n)

This is why building strings in a loop with += is O(n²) — each += copies the whole string. Always:

parts = []
for c in stuff:
    parts.append(c)
result = ''.join(parts)   # O(n) total

C++ strings: mutable

string s = "hello";
s[0] = 'H';     // OK: now "Hello"
s += 'X';       // amortized O(1), like vector

Common ops

# Python
s.upper(), s.lower()
s.split(' '), s.split()                 # split() handles any whitespace
'-'.join(['a', 'b'])                    # 'a-b'
s.find('ll')                            # index, -1 if not found
s.count('l')
ord('a') - ord('a')                     # 0  → useful for char-to-bucket-index

// C++
s.length();
s.substr(start, len);
s.find("ll");                            // string::npos if not found
to_string(42); stoi("42");
char c = 'a' + 3;                        // 'd'

Char-to-index trick (lowercase letters)

freq = [0] * 26
for c in s:
    freq[ord(c) - ord('a')] += 1

Cleaner than a hashmap for "only lowercase a-z" problems and slightly faster.

3. Hash tables

A hash table stores key-value pairs by:

Computing hash(key) % capacity to pick a bucket
Storing the entry in that bucket (chained list or open addressing)

            keys                        buckets (mod capacity = 8)
          ┌───────┐
   "cat" ─┤hash() ├──→ 5 ─→ ┌───┐    ┌─5─┐
          │       │         │ 0 │    │"cat":3 → "tom":1│
   "dog" ─┤hash() ├──→ 2 ─→ │ 1 │    │   │
          │       │         │ 2 │    │"dog":7│
   "tom" ─┤hash() ├──→ 5 ─→ │ 3 │    │   │
          └───────┘         │ 4 │    │   │
                            │ 5 │    │   │
                            │ 6 │    │   │
                            │ 7 │    │   │
                            └───┘

Why O(1) average?

If hash function spreads well and load factor (n / capacity) is below ~0.7, each bucket has a tiny constant chain. Lookup, insert, delete are O(1) on average.

Worst case is O(n) when all keys collide into one bucket. In interviews, just say "average O(1), worst case O(n) for hostile inputs / poor hash."

Python dict / set

d = {}
d['key'] = 42
d['key']                     # 42
'key' in d                   # True
del d['key']
d.get('missing', -1)         # -1 (no KeyError)

# Iterate
for k in d: ...
for k, v in d.items(): ...
for v in d.values(): ...

# defaultdict — auto-default for missing key
from collections import defaultdict
graph = defaultdict(list)
graph[u].append(v)           # graph[u] auto-becomes [] if absent

C++ unordered_map / map

unordered_map<string, int> m;
m["key"] = 42;
m.count("key");              // 0 or 1
m.find("key");               // iterator, m.end() if absent

map<int, int> tree;          // ordered, O(log n), useful for range queries
tree.lower_bound(x);         // first >= x
tree.upper_bound(x);         // first > x

Use unordered_map (O(1) avg) by default. Only use map (O(log n) tree-backed) when you need ordering or lower_bound.

4. Hash sets vs hash maps

Set — only keys, "is this in the collection?"
Map — key + value, "given the key, what's stored?"

Both are O(1) avg. Choose based on whether you need a value or just membership.

seen = set()
seen.add(x)
x in seen               # O(1)

count = {}
count[x] = count.get(x, 0) + 1

5. Frequency counting pattern

Trigger: "anagram", "duplicate", "majority element", "unique character", "k most frequent"

The pattern is: build a hashmap of element → count, then scan it.

from collections import Counter

# Are two strings anagrams?
def is_anagram(s, t):
    return Counter(s) == Counter(t)

# Most common element
def majority(nums):
    c = Counter(nums)
    return c.most_common(1)[0][0]

# Top K frequent
def top_k(nums, k):
    c = Counter(nums)
    return [item for item, _ in c.most_common(k)]   # O(n log k) with heap internally

Bucket sort variant (when k is small relative to n)

def top_k_frequent(nums, k):
    count = Counter(nums)
    buckets = [[] for _ in range(len(nums) + 1)]    # bucket[i] = elems with freq i
    for num, cnt in count.items():
        buckets[cnt].append(num)
    result = []
    for i in range(len(buckets) - 1, -1, -1):       # highest freq first
        for num in buckets[i]:
            result.append(num)
            if len(result) == k: return result

This is O(n) instead of O(n log k) — a clean optimization when frequencies are bounded by n.

6. Prefix sums

Trigger: "sum of subarray", "range sum query", "find subarray with sum = k"

The prefix-sum array P lets you compute any range sum in O(1) after O(n) preprocessing:

arr:    [3,  1,  4,  1,  5,  9]
P:      [0,  3,  4,  8,  9, 14, 23]
        ↑
        P[0] = 0 (empty prefix)
        P[i] = arr[0] + arr[1] + ... + arr[i-1]

sum(arr[l..r]) = P[r+1] - P[l]
                       ↑
   sum of arr[2..4] = 4+1+5 = 10
                    = P[5] - P[2] = 14 - 4 = 10 ✓

Build it

def prefix_sum(arr):
    P = [0] * (len(arr) + 1)
    for i, x in enumerate(arr):
        P[i+1] = P[i] + x
    return P

Subarray sum = k (Hashmap + prefix sum)

Classic problem: find number of subarrays whose sum is exactly k.

def subarray_sum_equals_k(nums, k):
    count = 0
    prefix = 0
    seen = {0: 1}                  # prefix value 0 has been seen once (empty prefix)
    for x in nums:
        prefix += x
        # we want to know: is there an earlier prefix p' such that prefix - p' = k?
        # i.e. p' = prefix - k. Look it up.
        if prefix - k in seen:
            count += seen[prefix - k]
        seen[prefix] = seen.get(prefix, 0) + 1
    return count

Line by line:

seen = {0: 1} — there's one empty prefix with sum 0; lets us count subarrays starting at index 0.
prefix += x — running prefix sum.
if prefix - k in seen — we want subarray sum = k. Sum from any earlier index j+1..i equals prefix[i] - prefix[j], so look for prefix - k among earlier prefixes.
count += seen[prefix - k] — add how many such earlier prefixes there were.
Update seen for the next iteration.

Time: O(n). Space: O(n).

7. Difference arrays

Trigger: "apply k range updates [l, r, +x], then query final array"

Naive: each update is O(r-l+1) → O(n·k) total. Difference array: each update is O(1) → O(n+k) total.

arr:  [0, 0, 0, 0, 0]
update [1, 3, +5]:
diff[1] += 5
diff[4] -= 5     ← the index AFTER r

diff: [0, 5, 0, 0, -5]
prefix-sum diff:  0, 5, 5, 5, 0
                  → arr = [0, 5, 5, 5, 0]  ✓

def apply_updates(n, updates):
    diff = [0] * (n + 1)
    for l, r, x in updates:
        diff[l] += x
        diff[r + 1] -= x
    # prefix sum to recover the array
    arr = [0] * n
    cur = 0
    for i in range(n):
        cur += diff[i]
        arr[i] = cur
    return arr

This is also the foundation of segment-tree-like range-update problems.

8. Walked-through problems

Problem 1: Two Sum (Easy)

Given nums and target, return indices of two numbers that sum to target.

Brute: O(n²) — try all pairs. Hash: O(n) — for each element, check if target - x is already seen.

def two_sum(nums, target):
    seen = {}                          # value → index
    for i, x in enumerate(nums):
        complement = target - x
        if complement in seen:         # found earlier
            return [seen[complement], i]
        seen[x] = i                    # remember this index for future

Why this works: by the time we look for complement, we've already inserted every previous element. So if a valid pair exists, when we hit the second element of the pair, the first is in seen.

Pitfall: insert after the lookup, otherwise nums = [3, 3], target = 6 would match 3 to itself.

Problem 2: Group Anagrams (Medium)

Group strings that are anagrams of each other.

Key insight: anagrams share the same sorted characters or the same frequency tuple.

def group_anagrams(strs):
    groups = defaultdict(list)
    for s in strs:
        key = tuple(sorted(s))         # ("a","b","t") for "bat", "tab"
        groups[key].append(s)
    return list(groups.values())

Optimization: use a length-26 tuple of counts instead of sorting. O(n·m) instead of O(n·m log m).

def group_anagrams_fast(strs):
    groups = defaultdict(list)
    for s in strs:
        cnt = [0] * 26
        for c in s:
            cnt[ord(c) - ord('a')] += 1
        groups[tuple(cnt)].append(s)
    return list(groups.values())

Problem 3: Longest Consecutive Sequence (Medium)

Given unsorted nums, find the length of the longest run of consecutive integers. O(n) required.

[100, 4, 200, 1, 3, 2]   →   4   (sequence 1,2,3,4)

Naive: sort → O(n log n). Required: O(n).

Hash trick: put everything in a set. For each number, check if it's the start of a run (i.e., n-1 is not in set). If so, count up.

def longest_consecutive(nums):
    s = set(nums)
    longest = 0
    for n in s:
        if n - 1 in s:
            continue              # not the start of a run
        # n is the start; count up
        cur = n
        length = 1
        while cur + 1 in s:
            cur += 1
            length += 1
        longest = max(longest, length)
    return longest

Why O(n)? Each number is the inner of while only when it's part of a run we entered from its start. Each element is visited at most twice across the whole algorithm. So total work is O(n), not O(n²).

This "only start runs from boundaries" trick generalizes to many problems.

Problem 4: Product of Array Except Self (Medium)

Return out[i] = product of all nums[j] where j != i. O(n) time, no division.

nums = [1, 2, 3, 4]
out  = [24, 12, 8, 6]

Two-pass technique: for each index, the answer is prefix_product * suffix_product.

def product_except_self(nums):
    n = len(nums)
    out = [1] * n
    
    # Pass 1: out[i] = product of all nums[0..i-1]
    prefix = 1
    for i in range(n):
        out[i] = prefix
        prefix *= nums[i]
    
    # Pass 2: multiply by product of all nums[i+1..n-1]
    suffix = 1
    for i in range(n - 1, -1, -1):
        out[i] *= suffix
        suffix *= nums[i]
    
    return out

Trace nums = [1,2,3,4]:

After pass 1: out = [1, 1, 2, 6]    (prefix products)
              prefix sequence: 1 → 1·1=1 → 1·2=2 → 2·3=6
After pass 2: from right
  i=3: out[3] *= 1  → 6;   suffix = 1·4 = 4
  i=2: out[2] *= 4  → 8;   suffix = 4·3 = 12
  i=1: out[1] *= 12 → 12;  suffix = 12·2 = 24
  i=0: out[0] *= 24 → 24
Final: [24, 12, 8, 6]  ✓

9. Pitfalls

Hashing custom objects

In Python, only hashable objects can be dict keys. Lists are unhashable; tuples are. So:

key = tuple(sorted(s))    # OK
key = sorted(s)           # TypeError: unhashable type: 'list'

For C++, custom types in unordered_map need a custom hash<T> specialization — easier to avoid by using pair<int,int>-of-primitives or string.

Iterating and modifying

Don't modify a dict while iterating:

for k in d:
    if d[k] == 0: del d[k]    # RuntimeError: dictionary changed size during iteration
# Fix:
for k in list(d):              # iterate over a snapshot of keys
    if d[k] == 0: del d[k]

Counter subtraction surprise

Counter("aabb") - Counter("ab")  # Counter({'a':1, 'b':1}) — not negative!

Counter's - only keeps positive counts. Use c1.subtract(c2) for pure subtraction.

Pre-sorting cost

If your "hash" approach involves sorting tuples per element (e.g., group anagrams), state the true cost: O(n · m log m) where m is string length. Often you can replace with bucket counts (O(n·m)).

10. Practice list

In Neetcode order — solve in this sequence.

#	Problem	Concept	Difficulty
1	Two Sum	Hash lookup	Easy
217	Contains Duplicate	Hash set	Easy
242	Valid Anagram	Frequency count	Easy
49	Group Anagrams	Hashmap with tuple key	Medium
347	Top K Frequent Elements	Bucket sort / heap	Medium
271	Encode and Decode Strings	String design	Medium
238	Product of Array Except Self	Prefix/suffix arrays	Medium
36	Valid Sudoku	Hash set of "row=N val=v"	Medium
128	Longest Consecutive Sequence	Hash set + boundary scan	Medium

Bonus advanced practice

560 — Subarray Sum Equals K (prefix + hash)
525 — Contiguous Array (prefix sum with mapping)
380 — Insert Delete GetRandom O(1) (hash + dynamic array)

TL;DR ✅

Arrays are O(1) random access, O(n) insert-mid. Dynamic arrays' append is O(1)*.
Python strings are immutable — never += in a loop. Build a list, then ''.join.
Hash tables = O(1) avg, O(n) worst. The single most-used DS.
Use Counter / defaultdict liberally; they cut 5-10 lines off most problems.
Frequency counting solves anagram-class problems.
Prefix sums solve range-sum-class problems in O(1) per query after O(n) preprocessing.
Hash + prefix sum solves "subarray with sum = k" in O(n).
For Top K, bucket sort gives O(n) when frequencies are bounded.

Next: 02-two-pointers.md.

External resources

NeetCode "Arrays & Hashing" playlist: https://www.youtube.com/playlist?list=PLot-Xpze53lfQmTEztbgdp8ALEoydvnRQ
VisuAlgo hashing animations: https://visualgo.net/en/hashtable
Python Counter docs: https://docs.python.org/3/library/collections.html#collections.Counter