4 min read By Vamsi Karuturi · Senior Backend Engineer at Salesforce

SQL — The Complete Interview Guide

Q: Find the Nth highest salary without LIMIT

SQL-- Using DENSE_RANK (handles ties correctly) SELECT DISTINCT salary FROM ( SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) as rnk FROM employees ) ranked WHERE rnk = 3; -- 3rd highest -- Using correlated subquery SELECT DISTINCT salary FROM employees e1 WHERE 3 - 1 = ( SELECT COUNT(DISTINCT salary) FROM employees e2 WHERE e2.salary > e1.salary );

Q: Find and delete duplicate rows (keep one)

SQL-- Find duplicates SELECT email, COUNT(*) FROM users GROUP BY email HAVING COUNT(*) > 1; -- Delete duplicates (PostgreSQL — keep lowest ID) WITH duplicates AS ( SELECT id, ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) as rn FROM users ) DELETE FROM users WHERE id IN (SELECT id FROM duplicates WHERE rn > 1); -- MySQL alternative (doesn't support DELETE with CTE easily) DELETE u1 FROM users u1 INNER JOIN users u2 ON u1.email = u2.email AND u1.id > u2.id;

Q: Employees earning more than their manager

SQLSELECT e.name AS employee, e.salary, m.name AS manager, m.salary AS mgr_salary FROM employees e JOIN employees m ON e.manager_id = m.id WHERE e.salary > m.salary;

Q: Running total of sales by month

SQLSELECT DATE_TRUNC('month', order_date) as month, SUM(total) as monthly_sales, SUM(SUM(total)) OVER (ORDER BY DATE_TRUNC('month', order_date)) as running_total FROM orders GROUP BY DATE_TRUNC('month', order_date) ORDER BY month;

Q: Find departments where average salary exceeds company average

SQLSELECT department, AVG(salary) as dept_avg FROM employees GROUP BY department HAVING AVG(salary) > (SELECT AVG(salary) FROM employees);

Q: Consecutive login days (gaps and islands)

SQLWITH login_groups AS ( SELECT user_id, login_date, login_date - ROW_NUMBER() OVER ( PARTITION BY user_id ORDER BY login_date ) * INTERVAL '1 day' AS grp FROM logins ) SELECT user_id, MIN(login_date) as streak_start, MAX(login_date) as streak_end, COUNT(*) as consecutive_days FROM login_groups GROUP BY user_id, grp HAVING COUNT(*) >= 7; -- users with 7+ day streaks

Q: Top 3 products per category by revenue

SQLWITH ranked AS ( SELECT category, product_name, SUM(revenue) as total_revenue, ROW_NUMBER() OVER (PARTITION BY category ORDER BY SUM(revenue) DESC) as rn FROM sales GROUP BY category, product_name ) SELECT category, product_name, total_revenue FROM ranked WHERE rn <= 3;

Q: Year-over-year growth percentage

SQLWITH yearly AS ( SELECT EXTRACT(YEAR FROM order_date) as year, SUM(total) as revenue FROM orders GROUP BY 1 ) SELECT year, revenue, LAG(revenue) OVER (ORDER BY year) as prev_year, ROUND((revenue - LAG(revenue) OVER (ORDER BY year)) / LAG(revenue) OVER (ORDER BY year) * 100, 1) as yoy_growth_pct FROM yearly;

Q: Pivot table: monthly sales by product

SQL-- PostgreSQL with FILTER SELECT product, SUM(amount) FILTER (WHERE EXTRACT(MONTH FROM sale_date) = 1) as jan, SUM(amount) FILTER (WHERE EXTRACT(MONTH FROM sale_date) = 2) as feb, SUM(amount) FILTER (WHERE EXTRACT(MONTH FROM sale_date) = 3) as mar FROM sales WHERE EXTRACT(YEAR FROM sale_date) = 2025 GROUP BY product; -- MySQL with CASE SELECT product, SUM(CASE WHEN MONTH(sale_date) = 1 THEN amount END) as jan, SUM(CASE WHEN MONTH(sale_date) = 2 THEN amount END) as feb, SUM(CASE WHEN MONTH(sal

Q: Median salary per department

SQL-- PostgreSQL (has PERCENTILE_CONT) SELECT department, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) as median_salary FROM employees GROUP BY department; -- MySQL (workaround with window functions) WITH ranked AS ( SELECT department, salary, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary) as rn, COUNT(*) OVER (PARTITION BY department) as cnt FROM employees ) SELECT department, AVG(salary) as median_salary FROM ranked WHERE rn IN (FLOOR((cnt + 1) / 2.0), CEIL((cnt + 1) / 2.0)

SQL is the one skill every backend engineer uses daily but few master. Senior interviews test whether you can write efficient queries, optimize slow ones, and explain why your index strategy works — not just recall syntax.

SQL Statement Categories

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart LR
    subgraph DDL["DDL (Data Definition)"]
        CREATE
        ALTER
        DROP
        TRUNCATE
    end
    subgraph DML["DML (Data Manipulation)"]
        SELECT
        INSERT
        UPDATE
        DELETE
    end
    subgraph DCL["DCL (Data Control)"]
        GRANT
        REVOKE
    end
    subgraph TCL["TCL (Transaction Control)"]
        COMMIT
        ROLLBACK
        SAVEPOINT
    end

Category	Commands	Rollback?
DDL	CREATE, ALTER, DROP, TRUNCATE	No (auto-commit)
DML	SELECT, INSERT, UPDATE, DELETE	Yes
DCL	GRANT, REVOKE	No
TCL	COMMIT, ROLLBACK, SAVEPOINT	Controls rollback

ACID Properties

Every relational database guarantees ACID for transactions:

Property	Meaning	What Breaks Without It
Atomicity	All operations in a transaction succeed or all fail	Partial transfer: money debited but not credited
Consistency	Transaction moves DB from one valid state to another	Violated constraints, impossible data states
Isolation	Concurrent transactions don't interfere	Dirty reads, phantom rows, lost updates
Durability	Committed data survives crashes	Data loss after power failure

SQL

-- Atomicity example: bank transfer
BEGIN TRANSACTION;
    UPDATE accounts SET balance = balance - 500 WHERE id = 1;
    UPDATE accounts SET balance = balance + 500 WHERE id = 2;
COMMIT;
-- If either UPDATE fails, both are rolled back

Isolation Levels

Level	Dirty Read	Non-Repeatable Read	Phantom Read	Performance
READ UNCOMMITTED	Yes	Yes	Yes	Fastest
READ COMMITTED	No	Yes	Yes	Default (PostgreSQL, Oracle)
REPEATABLE READ	No	No	Yes	Default (MySQL InnoDB)
SERIALIZABLE	No	No	No	Slowest

SQL

-- Set isolation level
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
    SELECT balance FROM accounts WHERE id = 1; -- always returns same value within txn
COMMIT;

Joins — Visual Guide

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart TD
    subgraph "JOIN Types"
        INNER["INNER JOIN<br/>Only matching rows"]
        LEFT["LEFT JOIN<br/>All left + matching right"]
        RIGHT["RIGHT JOIN<br/>All right + matching left"]
        FULL["FULL OUTER JOIN<br/>All rows from both"]
        CROSS["CROSS JOIN<br/>Cartesian product"]
        SELF["SELF JOIN<br/>Table joined to itself"]
    end

Sample Tables:

Text Only

┌─────────────────────────────┐     ┌──────────────────────────────┐
│         users               │     │          orders              │
├────┬─────────┬──────────────┤     ├──────────┬───────────┬───────┤
│ id │ name    │ email        │     │ order_id │ product   │user_id│
├────┼─────────┼──────────────┤     ├──────────┼───────────┼───────┤
│ 1  │ Alice   │ alice@g.com  │     │ 101      │ Laptop    │ 1     │
│ 2  │ Bob     │ bob@g.com    │     │ 102      │ Phone     │ 1     │
│ 3  │ Charlie │ charlie@g.com│     │ 103      │ TV        │ 2     │
└────┴─────────┴──────────────┘     │ 104      │ Tablet    │ 4     │
                                    └──────────┴───────────┴───────┘

INNER JOIN

Returns only rows where there's a match in both tables.

SQL

SELECT u.name, o.product
FROM users u
INNER JOIN orders o ON u.id = o.user_id;

Text Only

Result:                          Venn Diagram:
┌───────┬─────────┐            ┌─────────────────────────┐
│ name  │ product │            │  users  ┌──────┐ orders │
├───────┼─────────┤            │         │██████│        │
│ Alice │ Laptop  │            │         │██████│        │
│ Alice │ Phone   │            │         └──────┘        │
│ Bob   │ TV      │            └─────────────────────────┘
└───────┴─────────┘             Only the intersection

LEFT JOIN

Returns all rows from the left table + matching rows from right (NULL if no match).

SQL

SELECT u.name, o.product
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;

Text Only

Result:                          Venn Diagram:
┌─────────┬─────────┐          ┌─────────────────────────┐
│ name    │ product │          │  ┌──────────────┐ orders │
├─────────┼─────────┤          │  │██████████████│        │
│ Alice   │ Laptop  │          │  │████users█████│        │
│ Alice   │ Phone   │          │  │██████████████│        │
│ Bob     │ TV      │          │  └──────────────┘        │
│ Charlie │ NULL    │          └─────────────────────────┘
└─────────┴─────────┘           All left + matching right

RIGHT JOIN

Returns all rows from the right table + matching rows from left.

SQL

SELECT u.name, o.product
FROM users u
RIGHT JOIN orders o ON u.id = o.user_id;

Text Only

Result:
┌───────┬─────────┐
│ name  │ product │
├───────┼─────────┤
│ Alice │ Laptop  │
│ Alice │ Phone   │
│ Bob   │ TV      │
│ NULL  │ Tablet  │  ← user_id=4 doesn't exist in users
└───────┴─────────┘

FULL OUTER JOIN

Returns all rows from both tables (NULL where no match exists).

SQL

SELECT u.name, o.product
FROM users u
FULL OUTER JOIN orders o ON u.id = o.user_id;

Text Only

Result:
┌─────────┬─────────┐
│ name    │ product │
├─────────┼─────────┤
│ Alice   │ Laptop  │
│ Alice   │ Phone   │
│ Bob     │ TV      │
│ Charlie │ NULL    │  ← no orders
│ NULL    │ Tablet  │  ← no matching user
└─────────┴─────────┘

CROSS JOIN

Returns every combination of rows (Cartesian product). Rarely used intentionally.

SQL

SELECT u.name, o.product
FROM users u CROSS JOIN orders o;
-- Result: 3 users × 4 orders = 12 rows

SELF JOIN

A table joined to itself — used for hierarchical data.

SQL

-- Find each employee's manager
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

Window Functions

Window functions perform calculations across a set of rows related to the current row without collapsing them (unlike GROUP BY).

SQL

-- Syntax
function_name() OVER (
    PARTITION BY column     -- groups (like GROUP BY, but keeps rows)
    ORDER BY column         -- sort within partition
    ROWS/RANGE frame       -- which rows to include
)

ROW_NUMBER, RANK, DENSE_RANK

SQL

SELECT name, department, salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as row_num,
    RANK()       OVER (PARTITION BY department ORDER BY salary DESC) as rank,
    DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dense_rank
FROM employees;

Text Only

┌───────┬────────┬────────┬─────────┬──────┬────────────┐
│ name  │ dept   │ salary │ row_num │ rank │ dense_rank │
├───────┼────────┼────────┼─────────┼──────┼────────────┤
│ Alice │ Eng    │ 150000 │ 1       │ 1    │ 1          │
│ Bob   │ Eng    │ 150000 │ 2       │ 1    │ 1          │  ← tie
│ Carol │ Eng    │ 130000 │ 3       │ 3    │ 2          │  ← rank skips, dense doesn't
│ Dave  │ Sales  │ 120000 │ 1       │ 1    │ 1          │
└───────┴────────┴────────┴─────────┴──────┴────────────┘

Function	Ties	Gaps
ROW_NUMBER	Arbitrary order for ties	No gaps (1,2,3,4)
RANK	Same rank for ties	Gaps after ties (1,1,3,4)
DENSE_RANK	Same rank for ties	No gaps (1,1,2,3)

LAG / LEAD

Access previous or next row values without self-join:

SQL

SELECT date, revenue,
    LAG(revenue, 1) OVER (ORDER BY date) as prev_day,
    revenue - LAG(revenue, 1) OVER (ORDER BY date) as daily_change,
    LEAD(revenue, 1) OVER (ORDER BY date) as next_day
FROM daily_sales;

Running Total / Moving Average

SQL

SELECT date, amount,
    SUM(amount) OVER (ORDER BY date ROWS UNBOUNDED PRECEDING) as running_total,
    AVG(amount) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as moving_avg_7d
FROM transactions;

NTILE — Percentile Buckets

SQL

-- Divide employees into salary quartiles
SELECT name, salary,
    NTILE(4) OVER (ORDER BY salary) as quartile
FROM employees;

Subqueries & CTEs

Subquery Types

Type	Location	Returns
Scalar	SELECT, WHERE	Single value
Row	WHERE	Single row
Table	FROM	Result set
Correlated	WHERE	References outer query

SQL

-- Scalar subquery: employees earning above average
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- Correlated subquery: employees earning above their department average
SELECT name, salary, department
FROM employees e
WHERE salary > (
    SELECT AVG(salary)
    FROM employees
    WHERE department = e.department  -- references outer query
);

-- EXISTS: more efficient than IN for large subquery results
SELECT name FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders WHERE customer_id = c.id AND total > 1000
);

Common Table Expressions (CTEs)

SQL

-- Readable alternative to nested subqueries
WITH high_value_orders AS (
    SELECT customer_id, SUM(total) as total_spent
    FROM orders
    WHERE order_date >= '2025-01-01'
    GROUP BY customer_id
    HAVING SUM(total) > 10000
),
customer_details AS (
    SELECT c.id, c.name, c.tier
    FROM customers c
    JOIN high_value_orders h ON c.id = h.customer_id
)
SELECT * FROM customer_details WHERE tier != 'ENTERPRISE';

Recursive CTE — Hierarchical Data

SQL

-- Org chart: find all reports under a manager
WITH RECURSIVE org_tree AS (
    -- Base case
    SELECT id, name, manager_id, 1 as depth
    FROM employees WHERE id = 1  -- CEO

    UNION ALL

    -- Recursive case
    SELECT e.id, e.name, e.manager_id, ot.depth + 1
    FROM employees e
    JOIN org_tree ot ON e.manager_id = ot.id
    WHERE ot.depth < 10  -- safety limit
)
SELECT * FROM org_tree ORDER BY depth, name;

Indexing Deep Dive

How Indexes Work — B+ Tree

Text Only

                        ┌───────────────┐
                        │   [30 | 60]   │          ← Root node
                        └───┬───┬───┬───┘
                           ╱    │    ╲
              ┌────────────┐ ┌────────────┐ ┌────────────┐
              │ [10|20|30] │ │ [40|50|60] │ │ [70|80|90] │  ← Internal nodes
              └──┬──┬──┬───┘ └──┬──┬──┬───┘ └──┬──┬──┬───┘
                ╱   │   ╲      ╱   │   ╲      ╱   │   ╲
            ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐
            │10│→│20│→│30│→│40│→│50│→│60│→│70│→│80│→│90│  ← Leaf nodes (data)
            └──┘ └──┘ └──┘ └──┘ └──┘ └──┘ └──┘ └──┘ └──┘
                        Linked list for range scans →→→

Why B+ Tree?

Balanced: every leaf is at the same depth → O(log n) lookups
Leaf nodes linked: efficient range scans (BETWEEN, ORDER BY)
High fan-out: few disk reads to reach any record (3-4 levels for millions of rows)

Clustered vs Non-Clustered Index

	Clustered Index	Non-Clustered Index
Physical order	Rows stored in index order	Separate structure pointing to rows
Per table	Only ONE (usually Primary Key)	Many allowed
Leaf nodes	Contain actual row data	Contain pointers to row data
Best for	Range queries, sequential access	Point lookups, covering queries
Analogy	Book pages in order (Table of Contents IS the book)	Back-of-book index (pointer to page numbers)

SQL

-- Clustered index (implicit on PRIMARY KEY in most DBs)
CREATE TABLE orders (
    id BIGINT PRIMARY KEY,  -- clustered index on id
    customer_id INT,
    total DECIMAL(10,2),
    created_at TIMESTAMP
);

-- Non-clustered indexes
CREATE INDEX idx_customer ON orders(customer_id);
CREATE INDEX idx_created ON orders(created_at);

Composite Index & the Leftmost Prefix Rule

SQL

CREATE INDEX idx_composite ON orders(customer_id, created_at, status);

-- ✅ Uses index (leftmost prefix)
WHERE customer_id = 42
WHERE customer_id = 42 AND created_at > '2025-01-01'
WHERE customer_id = 42 AND created_at > '2025-01-01' AND status = 'SHIPPED'

-- ❌ Cannot use index (skips leftmost column)
WHERE created_at > '2025-01-01'
WHERE status = 'SHIPPED'
WHERE created_at > '2025-01-01' AND status = 'SHIPPED'

Covering Index

When an index contains ALL columns needed for a query — no table lookup required:

SQL

CREATE INDEX idx_covering ON orders(customer_id, created_at, total);

-- Index-only scan (no table access needed)
SELECT customer_id, created_at, total
FROM orders
WHERE customer_id = 42 AND created_at > '2025-01-01';

When NOT to Index

Scenario	Why
Small tables (<1000 rows)	Full scan is faster than index lookup
Columns with low cardinality	Boolean, status with 3 values — index scan hits most rows anyway
Write-heavy tables	Every INSERT/UPDATE must update all indexes
Columns rarely in WHERE/JOIN	Index takes space but never gets used

Query Optimization

Reading EXPLAIN Plans

SQL

EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id), SUM(o.total)
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.status = 'active' AND o.created_at > '2025-01-01'
GROUP BY u.name
ORDER BY SUM(o.total) DESC
LIMIT 100;

Key things to look for:

EXPLAIN Term	Meaning	Good/Bad
Seq Scan	Reading every row	Bad on large tables
Index Scan	Using index to find rows	Good
Index Only Scan	All data from index	Best
Bitmap Index Scan	Combines multiple indexes	Good for OR conditions
Hash Join	Hash table for joins	Good for large joins
Nested Loop	Loop per row	Good for small result sets
Sort	Sorting in memory/disk	Check if index can avoid
Rows (estimated vs actual)		Large gap = stale statistics

Optimization Checklist

SQL

-- BEFORE: 30 seconds
SELECT u.name, COUNT(o.id), SUM(o.total)
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.created_at > '2025-01-01'
AND u.status = 'active'
GROUP BY u.name
ORDER BY SUM(o.total) DESC
LIMIT 100;

-- Problems:
-- 1. LEFT JOIN + WHERE on right table = effectively INNER JOIN (confusing optimizer)
-- 2. No index on orders(user_id, created_at)
-- 3. No index on users(status)
-- 4. GROUP BY on name (not unique) instead of PK

-- AFTER: 50ms
-- Step 1: Add indexes
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
CREATE INDEX idx_users_status ON users(status) INCLUDE (name);

-- Step 2: Fix the query
SELECT u.name, COUNT(o.id), SUM(o.total)
FROM users u
INNER JOIN orders o ON u.id = o.user_id  -- LEFT→INNER (WHERE already filters NULLs)
WHERE u.status = 'active'
AND o.created_at > '2025-01-01'
GROUP BY u.id, u.name  -- PK in GROUP BY helps optimizer
ORDER BY SUM(o.total) DESC
LIMIT 100;

Common Optimization Patterns

Problem	Solution
`SELECT *`	Select only needed columns
`WHERE function(column)`	Rewrite so column is bare: `WHERE column > DATE_SUB(NOW(), INTERVAL 7 DAY)`
`LIKE '%search%'`	Full-text index or search engine (Elasticsearch)
`OR` conditions	UNION ALL or bitmap indexes
Correlated subquery	Rewrite as JOIN
`ORDER BY` + `LIMIT` without index	Add index matching ORDER BY
N+1 queries (ORM)	JOIN or batch fetch

Normalization

Normal Forms — Quick Reference

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart TD
    UNF["Unnormalized<br/>(repeating groups)"]
    NF1["1NF: Atomic values<br/> + unique rows"]
    NF2["2NF: No partial<br/>dependencies"]
    NF3["3NF: No transitive<br/>dependencies"]
    BCNF["BCNF: Every determinant<br/>is a candidate key"]

    UNF --> NF1 --> NF2 --> NF3 --> BCNF

Form	Rule	Violation Example
1NF	Atomic values, unique rows	`skills = "Java, Python"` in one cell
2NF	No partial dependency on composite key	`{student_id, course_id} → teacher_name` (depends only on course_id)
3NF	No transitive dependency	`employee → department → department_location`
BCNF	Every determinant is a candidate key	Edge case of 3NF with overlapping keys

SQL

-- Violation of 2NF:
-- order_items(order_id, product_id, product_name, quantity)
-- product_name depends only on product_id, not the full key

-- Fix: separate table
-- order_items(order_id, product_id, quantity)
-- products(product_id, product_name)

When to denormalize:

Read-heavy workloads where JOINs are expensive
Reporting/analytics (star schema)
Caching computed values (materialized views)
Microservice boundaries (each service owns its data)

Transactions & Locking

Lock Types

Lock	Compatibility	Use Case
Shared (S)	Multiple readers OK	SELECT ... FOR SHARE
Exclusive (X)	Blocks all others	UPDATE, DELETE, SELECT FOR UPDATE
Row-level	Only locks affected rows	InnoDB default
Table-level	Locks entire table	MyISAM, DDL operations

Deadlock Example & Prevention

SQL

-- Transaction 1:
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;  -- locks row 1
UPDATE accounts SET balance = balance + 100 WHERE id = 2;  -- waits for row 2

-- Transaction 2 (concurrent):
BEGIN;
UPDATE accounts SET balance = balance - 50 WHERE id = 2;   -- locks row 2
UPDATE accounts SET balance = balance + 50 WHERE id = 1;   -- waits for row 1
-- DEADLOCK!

Prevention: Always acquire locks in consistent order (by primary key).

SQL

-- Fix: both transactions lock lower ID first
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;  -- always lock 1 first
UPDATE accounts SET balance = balance + 100 WHERE id = 2;  -- then lock 2
COMMIT;

Advanced Patterns

Pagination — Offset vs Cursor

SQL

-- OFFSET pagination (simple but slow for deep pages)
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 10000;
-- Problem: DB must scan and discard 10,000 rows

-- CURSOR pagination (fast regardless of page depth)
SELECT * FROM posts
WHERE created_at < '2025-03-15T10:30:00'  -- cursor from last item
ORDER BY created_at DESC
LIMIT 20;
-- Only reads 20 rows using index on created_at

Upsert (INSERT or UPDATE)

SQL

-- PostgreSQL
INSERT INTO metrics (key, value, updated_at)
VALUES ('page_views', 1, NOW())
ON CONFLICT (key)
DO UPDATE SET value = metrics.value + 1, updated_at = NOW();

-- MySQL
INSERT INTO metrics (key, value, updated_at)
VALUES ('page_views', 1, NOW())
ON DUPLICATE KEY UPDATE value = value + 1, updated_at = NOW();

Soft Delete

SQL

-- Never actually delete — mark as deleted
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMP NULL;

-- "Delete" a user
UPDATE users SET deleted_at = NOW() WHERE id = 42;

-- All queries filter deleted rows
SELECT * FROM users WHERE deleted_at IS NULL;

-- Partial index for performance (PostgreSQL)
CREATE INDEX idx_active_users ON users(email) WHERE deleted_at IS NULL;

Optimistic Locking (Version Column)

SQL

-- Read current version
SELECT id, name, version FROM products WHERE id = 1;
-- Returns: {id: 1, name: "Widget", version: 5}

-- Update only if version matches (no one else changed it)
UPDATE products
SET name = 'Super Widget', version = version + 1
WHERE id = 1 AND version = 5;

-- If 0 rows affected: someone else modified it → retry or error

SQL vs NoSQL — When to Choose What

Factor	SQL (Relational)	NoSQL
Schema	Fixed, enforced	Flexible, schema-on-read
Scaling	Vertical (bigger machine)	Horizontal (more machines)
Transactions	Full ACID	Usually eventual consistency
Query power	Complex JOINs, aggregations	Simple key-value or document queries
Best for	Financial, relational data	High throughput, unstructured data
Examples	PostgreSQL, MySQL, Oracle	MongoDB, Cassandra, DynamoDB, Redis

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart TD
    A{What's your data like?} --> B{Highly relational?}
    B -->|Yes| C[SQL: PostgreSQL/MySQL]
    B -->|No| D{Need transactions?}
    D -->|Yes| C
    D -->|No| E{Access pattern?}
    E -->|Key-Value| F[Redis / DynamoDB]
    E -->|Document| G[MongoDB]
    E -->|Wide Column| H[Cassandra / HBase]
    E -->|Graph| I[Neo4j]

PostgreSQL vs MySQL

Feature	PostgreSQL	MySQL
SQL compliance	Full	Partial
JSON support	Native JSONB (indexable)	JSON type (limited)
Window functions	Full support	Full (since 8.0)
CTEs	Recursive + materialized	Recursive (since 8.0)
Concurrency	MVCC (no read locks)	MVCC (InnoDB)
Extensions	PostGIS, pg_trgm, hstore	Limited
Replication	Logical + streaming	Binary log
Best for	Complex queries, analytics	Web apps, read-heavy

Delete vs Truncate vs Drop

	DELETE	TRUNCATE	DROP
Type	DML	DDL	DDL
What it removes	Rows (with WHERE)	All rows	Entire table + structure
Rollback?	Yes	No (auto-commit)	No (auto-commit)
Triggers fire?	Yes	No	No
Speed	Slow (row by row)	Fast (deallocates pages)	Instant
Identity reset?	No	Yes	N/A
WHERE clause?	Yes	No	No

Interview Scenarios

Find the Nth highest salary without LIMIT

SQL

-- Using DENSE_RANK (handles ties correctly)
SELECT DISTINCT salary
FROM (
    SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) as rnk
    FROM employees
) ranked
WHERE rnk = 3;  -- 3rd highest

-- Using correlated subquery
SELECT DISTINCT salary
FROM employees e1
WHERE 3 - 1 = (
    SELECT COUNT(DISTINCT salary)
    FROM employees e2
    WHERE e2.salary > e1.salary
);

Find and delete duplicate rows (keep one)

SQL

-- Find duplicates
SELECT email, COUNT(*) FROM users GROUP BY email HAVING COUNT(*) > 1;

-- Delete duplicates (PostgreSQL — keep lowest ID)
WITH duplicates AS (
    SELECT id, ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) as rn
    FROM users
)
DELETE FROM users WHERE id IN (SELECT id FROM duplicates WHERE rn > 1);

-- MySQL alternative (doesn't support DELETE with CTE easily)
DELETE u1 FROM users u1
INNER JOIN users u2
ON u1.email = u2.email AND u1.id > u2.id;

Employees earning more than their manager

SQL

SELECT e.name AS employee, e.salary, m.name AS manager, m.salary AS mgr_salary
FROM employees e
JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;

Running total of sales by month

SQL

SELECT
    DATE_TRUNC('month', order_date) as month,
    SUM(total) as monthly_sales,
    SUM(SUM(total)) OVER (ORDER BY DATE_TRUNC('month', order_date)) as running_total
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;

Find departments where average salary exceeds company average

SQL

SELECT department, AVG(salary) as dept_avg
FROM employees
GROUP BY department
HAVING AVG(salary) > (SELECT AVG(salary) FROM employees);

Consecutive login days (gaps and islands)

SQL

WITH login_groups AS (
    SELECT user_id, login_date,
        login_date - ROW_NUMBER() OVER (
            PARTITION BY user_id ORDER BY login_date
        ) * INTERVAL '1 day' AS grp
    FROM logins
)
SELECT user_id, MIN(login_date) as streak_start,
       MAX(login_date) as streak_end,
       COUNT(*) as consecutive_days
FROM login_groups
GROUP BY user_id, grp
HAVING COUNT(*) >= 7;  -- users with 7+ day streaks

Top 3 products per category by revenue

SQL

WITH ranked AS (
    SELECT category, product_name, SUM(revenue) as total_revenue,
        ROW_NUMBER() OVER (PARTITION BY category ORDER BY SUM(revenue) DESC) as rn
    FROM sales
    GROUP BY category, product_name
)
SELECT category, product_name, total_revenue
FROM ranked WHERE rn <= 3;

Year-over-year growth percentage

SQL

WITH yearly AS (
    SELECT EXTRACT(YEAR FROM order_date) as year,
           SUM(total) as revenue
    FROM orders GROUP BY 1
)
SELECT year, revenue,
    LAG(revenue) OVER (ORDER BY year) as prev_year,
    ROUND((revenue - LAG(revenue) OVER (ORDER BY year))
        / LAG(revenue) OVER (ORDER BY year) * 100, 1) as yoy_growth_pct
FROM yearly;

Pivot table: monthly sales by product

SQL

-- PostgreSQL with FILTER
SELECT product,
    SUM(amount) FILTER (WHERE EXTRACT(MONTH FROM sale_date) = 1) as jan,
    SUM(amount) FILTER (WHERE EXTRACT(MONTH FROM sale_date) = 2) as feb,
    SUM(amount) FILTER (WHERE EXTRACT(MONTH FROM sale_date) = 3) as mar
FROM sales
WHERE EXTRACT(YEAR FROM sale_date) = 2025
GROUP BY product;

-- MySQL with CASE
SELECT product,
    SUM(CASE WHEN MONTH(sale_date) = 1 THEN amount END) as jan,
    SUM(CASE WHEN MONTH(sale_date) = 2 THEN amount END) as feb,
    SUM(CASE WHEN MONTH(sale_date) = 3 THEN amount END) as mar
FROM sales
WHERE YEAR(sale_date) = 2025
GROUP BY product;

Median salary per department

SQL

-- PostgreSQL (has PERCENTILE_CONT)
SELECT department,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) as median_salary
FROM employees
GROUP BY department;

-- MySQL (workaround with window functions)
WITH ranked AS (
    SELECT department, salary,
        ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary) as rn,
        COUNT(*) OVER (PARTITION BY department) as cnt
    FROM employees
)
SELECT department, AVG(salary) as median_salary
FROM ranked
WHERE rn IN (FLOOR((cnt + 1) / 2.0), CEIL((cnt + 1) / 2.0))
GROUP BY department;

Frequently Asked Questions

What are the most important SQL concepts for senior engineer interviews?

Senior interviews focus on: window functions (ROW_NUMBER, RANK, LAG/LEAD), query optimization with EXPLAIN plans, indexing strategy (composite indexes, covering indexes, B+ tree internals), transaction isolation levels, deadlock prevention, and scenario questions involving CTEs, recursive queries, and performance tuning. Writing correct JOINs is baseline — optimization is what separates senior from mid-level.

How do you optimize a slow SQL query step by step?

Run EXPLAIN ANALYZE to see the actual execution plan. 2. Look for sequential scans on large tables — add indexes. 3. Check if WHERE conditions can use existing indexes (leftmost prefix rule). 4. Replace correlated subqueries with JOINs. 5. Ensure statistics are up to date (ANALYZE table). 6. Consider covering indexes to avoid table lookups. 7. For pagination, use cursor-based instead of OFFSET. 8. Monitor with pg_stat_statements or slow query log.

When should you use a CTE vs a subquery vs a temporary table?

CTE: Best for readability, recursive queries, and when the same result is referenced multiple times in one query. Subquery: Fine for simple one-off filtering in WHERE/FROM. Temporary table: When you need to index intermediate results, or when the intermediate result is used across multiple statements in a transaction. CTEs in PostgreSQL 12+ are automatically inlined (optimized), so performance is usually equivalent to subqueries.

SQL — The Complete Interview Guide

SQL Statement Categories

ACID Properties

Isolation Levels

Joins — Visual Guide

INNER JOIN

LEFT JOIN

RIGHT JOIN

FULL OUTER JOIN

CROSS JOIN

SELF JOIN

Window Functions

ROW_NUMBER, RANK, DENSE_RANK

LAG / LEAD

Running Total / Moving Average

NTILE — Percentile Buckets

Subqueries & CTEs

Subquery Types

Common Table Expressions (CTEs)

Recursive CTE — Hierarchical Data

Indexing Deep Dive

How Indexes Work — B+ Tree

Clustered vs Non-Clustered Index

Composite Index & the Leftmost Prefix Rule

Covering Index

When NOT to Index

Query Optimization

Reading EXPLAIN Plans

Optimization Checklist

Common Optimization Patterns

Normalization

Normal Forms — Quick Reference

Transactions & Locking

Lock Types

Deadlock Example & Prevention

Advanced Patterns

Pagination — Offset vs Cursor

Upsert (INSERT or UPDATE)

Soft Delete

Optimistic Locking (Version Column)

SQL vs NoSQL — When to Choose What

PostgreSQL vs MySQL

Delete vs Truncate vs Drop

Interview Scenarios

Frequently Asked Questions

5-Minute System Design — Weekly