Arrays & Hashing
Arrays & Hashing
The foundation every other pattern builds on. If you can't solve array/hashmap problems fluently, you'll struggle with two pointers, sliding window, and DP. This pattern appears in 30-40% of FAANG phone screens.
Why This Pattern Matters
Interview Reality
Arrays & Hashing is not just one pattern — it's the toolbox you reach for inside every other pattern. Two Pointers uses arrays. Sliding Window uses hash maps for constraint tracking. DP uses arrays for memoization. Master this first, and everything else becomes easier.
The core insight: trading space for time. Almost every O(n^2) brute force involving "find something in the array" can be reduced to O(n) by pre-storing values in a HashMap.
Core Concepts
Array Internals
%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
block-beta
columns 8
block:header:8
A["Array: int[] nums = {4, 2, 7, 1, 9}"]
end
space:8
I0["idx 0"] I1["idx 1"] I2["idx 2"] I3["idx 3"] I4["idx 4"] space space space
V0["4"] V1["2"] V2["7"] V3["1"] V4["9"] space space space
block:mem:8
M["Contiguous memory: base_addr + (index × 4 bytes) = O(1) access"]
end Key properties:
- Contiguous memory — elements stored sequentially, enabling O(1) random access via pointer arithmetic
- Fixed size (in Java,
int[]) — once allocated, cannot grow. UseArrayListfor dynamic sizing (amortized O(1) append via doubling) - Cache-friendly — sequential access patterns are fast because the CPU prefetcher loads adjacent memory into L1 cache
HashMap Internals (Java)
This Gets Asked Directly
Amazon and Google have asked "explain how HashMap works internally" as a standalone question. Know the numbers.
%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart TD
subgraph HashMap["HashMap<K, V> — Java 8+"]
direction TB
PUT["put(key, value)"] --> HASH["hash = key.hashCode() ^ (hash >>> 16)"]
HASH --> BUCKET["index = hash & (capacity - 1)"]
BUCKET --> CHECK{Bucket empty?}
CHECK -->|Yes| INSERT["Store as Node"]
CHECK -->|No| CHAIN{Linked list length?}
CHAIN -->|"< 8 nodes"| LINKED["Append to linked list — O(n) worst"]
CHAIN -->|">= 8 nodes"| TREE["Treeify to Red-Black Tree — O(log n)"]
end
style HashMap fill:#f8f9fa,stroke:#333
style TREE fill:#FEE2E2,stroke:#DC2626
style INSERT fill:#DCFCE7,stroke:#16A34A | Property | Value | Why It Matters |
|---|---|---|
| Default capacity | 16 | Power of 2 for fast modulo via & |
| Load factor | 0.75 | Triggers resize at 75% full (12 entries for cap 16) |
| Resize | 2x capacity | Rehashes all entries — O(n) one-time cost |
| Treeify threshold | 8 nodes in one bucket | Converts linked list to Red-Black tree |
| Untreeify threshold | 6 nodes | Converts back to linked list on removal |
HashSet for Dedup
HashSet<T> is internally just a HashMap<T, Object> where all values are a dummy constant (PRESENT). Use it when you only care about existence, not key-value mapping.
// O(n) dedup check — "does this element already exist?"
Set<Integer> seen = new HashSet<>();
for (int num : nums) {
if (!seen.add(num)) {
// duplicate found
}
}
Prefix Sum
The prefix sum converts any "sum of subarray [i, j]" query from O(n) to O(1) after an O(n) precomputation.
%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart LR
subgraph Original["nums[]"]
A0["2"] --> A1["4"] --> A2["1"] --> A3["3"] --> A4["5"]
end
subgraph Prefix["prefix[]"]
P0["0"] --> P1["2"] --> P2["6"] --> P3["7"] --> P4["10"] --> P5["15"]
end
subgraph Query["Sum of nums[1..3] = prefix[4] - prefix[1] = 10 - 2 = 8"]
Q["sum(i, j) = prefix[j+1] - prefix[i]"]
end
Original --> Prefix --> Query
style Query fill:#DBEAFE,stroke:#2563EB Formula: sum(i, j) = prefix[j + 1] - prefix[i] where prefix[0] = 0 and prefix[k] = prefix[k-1] + nums[k-1].
Subarray vs Subsequence vs Subset
These terms get confused constantly in interviews. Know the difference.
| Term | Contiguous? | Order Preserved? | Example from [1,2,3,4] |
|---|---|---|---|
| Subarray | Yes | Yes | [2,3,4] |
| Subsequence | No | Yes | [1,3,4] |
| Subset | No | No | {1,4,2} |
Interview tip: When a problem says "subarray," you can use sliding window or prefix sum. When it says "subsequence," you often need DP. When it says "subset," think backtracking or bitmask.
Pattern Recognition
The 30-Second Decision Framework
Read the problem constraints and keywords, then pick the approach. This table covers 90% of array/hash problems.
| Signal in Problem | Think... | Example Problem |
|---|---|---|
| "Find a pair/triplet that sums to X" | HashMap (complement lookup) or Two Pointers on sorted | Two Sum, 3Sum |
| "Count/frequency of elements" | HashMap frequency counter | Top K Frequent Elements |
| "Group elements by some property" | HashMap with computed key | Group Anagrams |
| "Check if duplicate/seen before" | HashSet | Contains Duplicate |
| "Sum of subarray equals K" | Prefix sum + HashMap | Subarray Sum Equals K |
| "Product of all except self" | Prefix/suffix product arrays | Product of Array Except Self |
| "Longest consecutive sequence" | HashSet + boundary detection | Longest Consecutive Sequence |
| "In-place operation, O(1) space" | Swap/partition or use array as its own hash | First Missing Positive |
| "Constraints: n <= 10^6" | Must be O(n) or O(n log n) | Rules out O(n^2) brute force |
| "Contiguous subarray with max/min sum" | Kadane's algorithm | Maximum Subarray |
Reusable Java Templates
/**
* Pattern: Count occurrences of each element.
* Use when: top-K, majority element, anagram checks.
*/
public Map<Integer, Integer> frequencyCount(int[] nums) {
Map<Integer, Integer> freq = new HashMap<>();
for (int num : nums) {
freq.merge(num, 1, Integer::sum);
}
return freq;
}
// Alternative with getOrDefault:
// freq.put(num, freq.getOrDefault(num, 0) + 1);
/**
* Pattern: Store value->index in first pass, query in second.
* Use when: finding pairs, checking complements.
* Optimization: single-pass variant checks BEFORE inserting.
*/
public int[] twoPassLookup(int[] nums, int target) {
Map<Integer, Integer> valToIndex = new HashMap<>();
// Single-pass variant (preferred for Two Sum):
for (int i = 0; i < nums.length; i++) {
int complement = target - nums[i];
if (valToIndex.containsKey(complement)) {
return new int[]{valToIndex.get(complement), i};
}
valToIndex.put(nums[i], i);
}
return new int[]{};
}
/**
* Pattern: Precompute cumulative sums for O(1) range queries.
* Use when: subarray sum problems, range sum queries.
*/
public int[] buildPrefixSum(int[] nums) {
int[] prefix = new int[nums.length + 1];
for (int i = 0; i < nums.length; i++) {
prefix[i + 1] = prefix[i] + nums[i];
}
return prefix;
// rangeSum(i, j) = prefix[j + 1] - prefix[i]
}
/**
* Prefix sum + HashMap: count subarrays with sum == k
*/
public int subarraySum(int[] nums, int k) {
Map<Integer, Integer> prefixCount = new HashMap<>();
prefixCount.put(0, 1); // empty prefix
int sum = 0, count = 0;
for (int num : nums) {
sum += num;
count += prefixCount.getOrDefault(sum - k, 0);
prefixCount.merge(sum, 1, Integer::sum);
}
return count;
}
/**
* Pattern: Group elements by a computed canonical key.
* Use when: grouping anagrams, categorizing strings.
*/
public List<List<String>> groupByKey(String[] strs) {
Map<String, List<String>> groups = new HashMap<>();
for (String s : strs) {
// Compute canonical key (sorted chars for anagrams)
char[] chars = s.toCharArray();
Arrays.sort(chars);
String key = new String(chars);
groups.computeIfAbsent(key, k -> new ArrayList<>()).add(s);
}
return new ArrayList<>(groups.values());
}
// Alternative key: frequency array as string
// "eat" -> "1a0b0c0d1e..." (avoids O(k log k) sort)
private String charCountKey(String s) {
int[] count = new int[26];
for (char c : s.toCharArray()) count[c - 'a']++;
return Arrays.toString(count);
}
Solved Walkthroughs
1. Two Sum (LC #1)
Interview Context
Asked at: Every company | Frequency: The most-asked problem in history | Time: 10-15 minutes (warm-up)
Problem: Given array nums and integer target, return indices of two numbers that sum to target. Exactly one solution exists.
Thought Process
Dead end — Brute Force O(n^2):
// Check every pair — works but too slow for n = 10^4+
for (int i = 0; i < nums.length; i++)
for (int j = i + 1; j < nums.length; j++)
if (nums[i] + nums[j] == target)
return new int[]{i, j};
The HashMap Insight: For each element nums[i], I need target - nums[i] to exist somewhere else. Instead of scanning the whole array, store what I've already seen in a HashMap. Each lookup is O(1).
Solution
public int[] twoSum(int[] nums, int target) {
Map<Integer, Integer> seen = new HashMap<>(); // value -> index
for (int i = 0; i < nums.length; i++) {
int complement = target - nums[i];
if (seen.containsKey(complement)) {
return new int[]{seen.get(complement), i};
}
seen.put(nums[i], i);
}
throw new IllegalArgumentException("No solution");
}
Complexity: Time O(n), Space O(n)
Common mistake: Putting the element in the map BEFORE checking the complement. This causes the element to match with itself (e.g., target=6, nums=[3,3] would incorrectly return index 0 twice).
2. Subarray Sum Equals K (LC #560)
Interview Context
Asked at: Google, Facebook, Amazon | Level: Medium | Key insight: Prefix sum + HashMap
Problem: Given array nums and integer k, return the total number of contiguous subarrays whose sum equals k.
Why This Is Tricky
- Negative numbers exist, so sliding window does NOT work (window sum is not monotonic)
- Brute force: try all O(n^2) subarrays, compute each sum in O(n) = O(n^3). Or precompute prefix sums for O(n^2). Neither is good enough.
The Insight: Prefix Sum + HashMap
%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart LR
subgraph Key Insight
direction TB
A["If prefix[j] - prefix[i] == k"] --> B["Then subarray (i, j] sums to k"]
B --> C["For each j, count how many i's have prefix[i] == prefix[j] - k"]
C --> D["Store prefix sum frequencies in a HashMap"]
end
style A fill:#DCFCE7,stroke:#16A34A
style D fill:#DBEAFE,stroke:#2563EB Visual example: nums = [1, 2, 3, -1, 1], k = 3
| Index | nums[i] | Running Sum | Need (sum - k) | Found in Map? | Count |
|---|---|---|---|---|---|
| — | — | 0 | — | {0:1} (base) | 0 |
| 0 | 1 | 1 | 1 - 3 = -2 | No | 0 |
| 1 | 2 | 3 | 3 - 3 = 0 | Yes (1 time) | 1 |
| 2 | 3 | 6 | 6 - 3 = 3 | Yes (1 time) | 2 |
| 3 | -1 | 5 | 5 - 3 = 2 | No | 2 |
| 4 | 1 | 6 | 6 - 3 = 3 | Yes (1 time) | 3 |
Answer: 3 subarrays — [1,2], [3], [2,3,-1,1]
Solution
public int subarraySum(int[] nums, int k) {
// Key: prefix sum value. Value: how many times we've seen it.
Map<Integer, Integer> prefixCount = new HashMap<>();
prefixCount.put(0, 1); // empty subarray has sum 0
int runningSum = 0;
int count = 0;
for (int num : nums) {
runningSum += num;
// How many previous prefixes give us exactly k?
count += prefixCount.getOrDefault(runningSum - k, 0);
prefixCount.merge(runningSum, 1, Integer::sum);
}
return count;
}
Complexity: Time O(n), Space O(n)
Why prefixCount.put(0, 1)? Without it, you'd miss subarrays starting at index 0. If runningSum == k at some point, you need to count that as one valid subarray — which requires a "prefix sum of 0" to exist.
3. Group Anagrams (LC #49)
Interview Context
Asked at: Amazon, Facebook, Bloomberg | Level: Medium | Key insight: Canonical key for grouping
Problem: Given an array of strings, group anagrams together. Return the groups in any order.
Example: ["eat","tea","tan","ate","nat","bat"] → [["eat","tea","ate"],["tan","nat"],["bat"]]
The Key Insight
Two strings are anagrams if and only if they have the same characters in the same frequencies. So we need a canonical form — a "fingerprint" — that is identical for all anagrams.
Option A: Sort each string. "eat" → "aet", "tea" → "aet". Same key = same group. Cost: O(k log k) per string where k = string length.
Option B: Character frequency count as key. "eat" → [1,0,0,0,1,0,...,1,0,0]. Cost: O(k) per string, but key is a 26-element array (constant overhead for short strings).
Solution (Sorted Key — Cleaner for Interviews)
public List<List<String>> groupAnagrams(String[] strs) {
Map<String, List<String>> groups = new HashMap<>();
for (String s : strs) {
char[] chars = s.toCharArray();
Arrays.sort(chars);
String key = new String(chars);
groups.computeIfAbsent(key, k -> new ArrayList<>()).add(s);
}
return new ArrayList<>(groups.values());
}
Complexity: Time O(n * k log k) where n = number of strings, k = max string length. Space O(n * k).
Follow-up the interviewer might ask: "Can you do it without sorting?" — use the frequency count key:
private String buildKey(String s) {
int[] freq = new int[26];
for (char c : s.toCharArray()) freq[c - 'a']++;
// "1#0#0#...#1#0#1" — use delimiter to avoid collisions
StringBuilder sb = new StringBuilder();
for (int f : freq) sb.append(f).append('#');
return sb.toString();
}
This gives O(n * k) time but in practice the constant factor makes it roughly equal for typical interview inputs.
Common Mistakes
| Mistake | Why It's Wrong | Fix |
|---|---|---|
Using == to compare strings as HashMap keys | Compares references, not content. Two equal strings from different operations won't match. | Always use .equals() or let HashMap handle it (it uses .hashCode() + .equals() internally — works correctly for String) |
Forgetting prefixCount.put(0, 1) in prefix sum problems | Misses all subarrays starting at index 0 that sum to k | Always initialize with the empty prefix |
| Modifying the array while iterating | ConcurrentModificationException or skipped elements | Use a separate result structure or iterate by index |
Using int[] as a HashMap key | Arrays use identity-based hashCode(), not content-based | Convert to String with Arrays.toString() or use List<Integer> |
| Assuming HashMap is O(1) for worst case | Pathological hash collisions → O(n) per op (Java 8 mitigates with treeification) | Mention amortized O(1); for interview purposes, state the assumption |
| Off-by-one in prefix sum indexing | prefix[i] represents sum of first i elements, NOT element at index i | Use prefix[j+1] - prefix[i] for range [i, j] |
| Not handling empty/null inputs | NullPointerException on first access | Add guard clause at the start; clarify constraints with interviewer |
Practice Problems
| # | Problem | Pattern | Difficulty | Key Insight |
|---|---|---|---|---|
| 1 | Two Sum | HashMap complement | Easy | Store value→index, check complement before inserting |
| 217 | Contains Duplicate | HashSet | Easy | Set.add() returns false if element exists |
| 242 | Valid Anagram | Frequency count | Easy | 26-int array comparison, or sort both strings |
| 49 | Group Anagrams | Group by key | Medium | Sorted string (or freq array) as canonical key |
| 347 | Top K Frequent Elements | Freq map + bucket sort | Medium | Bucket sort by frequency gives O(n) |
| 238 | Product of Array Except Self | Prefix/suffix product | Medium | Left pass × right pass, no division needed |
| 128 | Longest Consecutive Sequence | HashSet boundary | Medium | Only start counting from sequence boundary (num-1 not in set) |
| 560 | Subarray Sum Equals K | Prefix sum + HashMap | Medium | Count prefix sums, look up (currentSum - k) |
| 53 | Maximum Subarray | Kadane's algorithm | Medium | Reset running sum when it goes negative |
| 56 | Merge Intervals | Sort + linear scan | Medium | Sort by start, merge overlapping |
| 41 | First Missing Positive | Array-as-hashmap | Hard | Place num at index num-1 (cyclic sort) |
| 525 | Contiguous Array | Prefix sum + HashMap | Medium | Treat 0 as -1, find longest subarray with sum 0 |
| 380 | Insert Delete GetRandom O(1) | HashMap + ArrayList | Medium | HashMap for O(1) lookup, swap-to-end for O(1) delete |
| 659 | Encode and Decode Strings | Length prefix encoding | Medium | Store length + delimiter before each string |
| 36 | Valid Sudoku | HashSet per row/col/box | Medium | Use "row-col-box" encoded keys in sets |
Interview Tips
What Interviewers Are Actually Evaluating
1. Do you immediately reach for a HashMap? The #1 signal of a prepared candidate is recognizing when O(n^2) can become O(n) with a hash map. State this transition explicitly: "Brute force is O(n^2) because for each element I search the rest. I can precompute lookups in a HashMap to make it O(n)."
2. Do you handle edge cases without being prompted? Empty arrays, single-element arrays, negative numbers, integer overflow on sums. Mention these proactively.
3. Do you know the time/space tradeoff? Always state: "I'm using O(n) extra space to achieve O(n) time." This shows you understand the tradeoff is intentional.
4. Can you optimize further when prompted? Common follow-ups: "Can you do it in O(1) space?" (usually means use the array itself as storage, or two-pointer on sorted array). "Can you do it in one pass?" (single-pass HashMap techniques).
5. Do you communicate while coding? Narrate your choices: why you picked HashMap over TreeMap, why you iterate forward not backward, why you check before inserting. Silence is the enemy.
Frequently Asked Follow-Up Questions
- "What if the array is sorted?" → Two pointers O(n) time, O(1) space
- "What if there are duplicates?" → Use frequency map instead of set
- "What about integer overflow?" → Use
longfor running sums - "Can you do it without extra space?" → Usually means modify array in-place (cyclic sort, index marking)
- "What if the array is too large for memory?" → External sort or streaming with bounded HashMap