2 min read By Vamsi Karuturi · Senior Backend Engineer at Salesforce

Unique ID Generation in Distributed Systems

Q: Why not just use auto-increment across all services?

Answer: Auto-increment requires a single authority (one database sequence). In distributed systems this creates a bottleneck and single point of failure. Even with multi-master replication (odd/even IDs), it doesn't scale beyond a handful of nodes and creates coordination overhead. Snowflake-style IDs are generated locally with zero network calls.

Q: How does Snowflake guarantee uniqueness without coordination?

Answer: The 10-bit machine ID is assigned once at startup (via config, ZooKeeper, or K8s ordinal). Within a single machine, the combination of timestamp + sequence is unique because: same millisecond uses incrementing sequence (0-4095), different milliseconds reset the sequence. Since no two machines share a machine ID, and no single machine generates more than 4096 IDs per millisecond, global uniqueness is guaranteed mathematically.

Q: What's the problem with UUID v4 as a primary key in PostgreSQL?

Answer: UUID v4 is random — inserts scatter across the B-tree index instead of appending at the end. This causes: Page splits — existing pages must be split to accommodate random inserts Cache misses — no locality, random pages pulled from disk Write amplification — more I/O per insert Index bloat — fragmented B-tree wastes space Solution: Use time-sorted IDs (ULID, UUID v7, Snowflake) that insert sequentially.

Generate 10,000+ unique, sortable, non-colliding IDs per second across hundreds of servers — without any coordination.

Real-World Analogy

Think of license plates. Each state (server) has its own prefix (machine ID), the year (timestamp) provides ordering, and the sequential number ensures uniqueness within that state. No two states need to "call each other" to avoid duplicates — the system is coordination-free by design.

Why Auto-Increment Doesn't Work

Problem	Why
Single point of failure	One DB generates all IDs → bottleneck
Scalability	Can't distribute across multiple servers
Predictable	Competitors can guess your user count, order volume
No time-ordering	IDs don't indicate when something was created
Cross-service	Different services can't generate IDs independently

Approaches Compared

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart LR
    subgraph Approaches["ID Generation Strategies"]
        UUID["UUID v4<br/>(random)"]
        SNOW["Snowflake<br/>(Twitter, 2010)"]
        TSID["TSID<br/>(modern Snowflake)"]
        ULID["ULID<br/>(sortable UUID)"]
        DB["DB Sequences<br/>(with ranges)"]
    end

    style SNOW fill:#E8F5E9,stroke:#2E7D32,color:#000
    style TSID fill:#E3F2FD,stroke:#1565C0,color:#000

Strategy	Length	Sortable?	Coordination-Free?	Collision Risk	Throughput
UUID v4	128 bit	No	Yes	~negligible	Unlimited
Snowflake	64 bit	Yes (time)	Yes	Zero (by design)	4096/ms/machine
TSID	64 bit	Yes (time)	Yes	Zero	4096/ms/machine
ULID	128 bit	Yes (time)	Yes	~negligible	Unlimited
DB Range	Variable	Yes	No (needs DB)	Zero	Batch-limited

Twitter Snowflake (The Gold Standard)

Bit Layout (64 bits total)

Text Only

┌────────────────────────────────────────────────────────────────────────┐
│ 0 │ Timestamp (41 bits)          │ Machine ID (10 bits) │ Sequence (12 bits) │
│   │ milliseconds since epoch     │ 0-1023 machines      │ 0-4095 per ms      │
└────────────────────────────────────────────────────────────────────────┘
  1         41 bits                      10 bits              12 bits

Component	Bits	Range	Purpose
Sign bit	1	Always 0	Keeps ID positive
Timestamp	41	~69 years from epoch	Time-ordering
Machine ID	10	1024 machines	No coordination needed
Sequence	12	4096 per millisecond	Uniqueness within same ms

Implementation

Java

public class SnowflakeIdGenerator {
    private static final long EPOCH = 1704067200000L;  // 2024-01-01 00:00:00 UTC
    private static final long MACHINE_ID_BITS = 10;
    private static final long SEQUENCE_BITS = 12;
    private static final long MAX_MACHINE_ID = (1L << MACHINE_ID_BITS) - 1;  // 1023
    private static final long MAX_SEQUENCE = (1L << SEQUENCE_BITS) - 1;       // 4095

    private final long machineId;
    private long lastTimestamp = -1;
    private long sequence = 0;

    public SnowflakeIdGenerator(long machineId) {
        if (machineId < 0 || machineId > MAX_MACHINE_ID) {
            throw new IllegalArgumentException("Machine ID must be 0-" + MAX_MACHINE_ID);
        }
        this.machineId = machineId;
    }

    public synchronized long nextId() {
        long timestamp = System.currentTimeMillis();

        if (timestamp == lastTimestamp) {
            sequence = (sequence + 1) & MAX_SEQUENCE;
            if (sequence == 0) {
                // Sequence exhausted in this millisecond — wait for next ms
                timestamp = waitNextMillis(lastTimestamp);
            }
        } else {
            sequence = 0;
        }

        if (timestamp < lastTimestamp) {
            throw new RuntimeException("Clock moved backwards!");
        }

        lastTimestamp = timestamp;

        return ((timestamp - EPOCH) << (MACHINE_ID_BITS + SEQUENCE_BITS))
             | (machineId << SEQUENCE_BITS)
             | sequence;
    }

    private long waitNextMillis(long lastTs) {
        long ts = System.currentTimeMillis();
        while (ts <= lastTs) {
            ts = System.currentTimeMillis();
        }
        return ts;
    }
}

Properties

Java

// Generated IDs are:
// 1. Unique — machineId prevents collisions across servers
// 2. Roughly time-sorted — later IDs have higher timestamps
// 3. 64-bit — fits in a long, efficient for indexing
// 4. Decentralized — no coordination between machines

// Example IDs (decimal):
// 7189234567890001  (server 1, time T)
// 7189234567890002  (server 1, time T, next sequence)
// 7189234572340001  (server 2, time T+5ms)

ULID — Universally Unique Lexicographically Sortable Identifier

Text Only

01ARZ3NDEKTSV4RRFFQ69G5FAV
└──────────┘└────────────────┘
 Timestamp     Randomness
  (48 bits)    (80 bits)
  10 chars     16 chars

Java

// Using ulid-creator library
import com.github.f4b6a3.ulid.UlidCreator;

String ulid = UlidCreator.getMonotonicUlid().toString();
// "01HXYZ3NDEKTSV4RRFFQ69G5FAV"
// Lexicographically sortable — string comparison = time ordering!

ULID Advantage	Explanation
String-sortable	`"01H..." < "01J..."` → alphabetical = chronological
128 bits	Same size as UUID, drop-in replacement
URL-safe	Base32 encoding, no special characters
Monotonic option	Guarantees ordering within same millisecond

UUID — When to (Still) Use It

Java

// UUID v4 — random, no ordering
UUID id = UUID.randomUUID();  // "550e8400-e29b-41d4-a716-446655440000"

// UUID v7 (Java 21+ draft) — time-ordered, random tail
// Not yet in standard Java, but coming

Use UUID When	Use Snowflake/TSID When
No ordering needed	Need time-sorted IDs
Low throughput (< 1K/s)	High throughput (> 10K/s)
Don't care about size (128 bits)	Need compact 64-bit IDs
No machine coordination possible	Can assign machine IDs
Database supports UUID type well	Need B-tree friendly ordering

UUID v4 and Database Performance

Random UUIDs as primary keys cause B-tree fragmentation in PostgreSQL/MySQL. Each insert goes to a random page → no locality → slow writes at scale. Time-sorted IDs (Snowflake, ULID, UUID v7) insert sequentially → B-tree stays compact.

Machine ID Assignment

Strategy	How	Trade-off
Manual configuration	Env var or config file per server	Simple but error-prone
ZooKeeper / etcd	Lease-based sequential assignment	Reliable but adds dependency
Database sequence	Each server claims a range	Simple, requires DB at startup
Kubernetes pod ordinal	StatefulSet pod index	Perfect for K8s deployments
MAC address hash	Hash last 10 bits of MAC	No coordination but collision possible

Clock Skew Problem

What Happens When the Clock Goes Backwards?

NTP corrections can move the system clock backwards. If currentTime < lastTimestamp, Snowflake would generate duplicate IDs or throw an error.

Solutions:

Refuse to generate — throw exception, let caller retry (Twitter's approach)
Wait — spin until clock catches up (works for small skews)
Use logical clock — increment a counter regardless of wall clock
Hybrid — tolerate small skews (< 5ms), alert on large ones

Choosing the Right Strategy

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart TD
    Q1{"Need 64-bit<br/>(DB-friendly)?"}
    Q1 -->|"Yes"| Q2{"Need strict<br/>time ordering?"}
    Q1 -->|"No (128 OK)"| Q3{"Need string<br/>sortable?"}
    Q2 -->|"Yes"| SNOW["Snowflake / TSID"]
    Q2 -->|"No"| SNOW
    Q3 -->|"Yes"| ULID["ULID"]
    Q3 -->|"No"| UUID["UUID v4"]

    style SNOW fill:#E8F5E9,stroke:#2E7D32,color:#000
    style ULID fill:#E3F2FD,stroke:#1565C0,color:#000
    style UUID fill:#FEF3C7,stroke:#D97706,color:#000

Interview Questions

Why not just use auto-increment across all services?

Answer: Auto-increment requires a single authority (one database sequence). In distributed systems this creates a bottleneck and single point of failure. Even with multi-master replication (odd/even IDs), it doesn't scale beyond a handful of nodes and creates coordination overhead. Snowflake-style IDs are generated locally with zero network calls.

How does Snowflake guarantee uniqueness without coordination?

Answer: The 10-bit machine ID is assigned once at startup (via config, ZooKeeper, or K8s ordinal). Within a single machine, the combination of timestamp + sequence is unique because: same millisecond uses incrementing sequence (0-4095), different milliseconds reset the sequence. Since no two machines share a machine ID, and no single machine generates more than 4096 IDs per millisecond, global uniqueness is guaranteed mathematically.

What's the problem with UUID v4 as a primary key in PostgreSQL?

Answer: UUID v4 is random — inserts scatter across the B-tree index instead of appending at the end. This causes:

Page splits — existing pages must be split to accommodate random inserts
Cache misses — no locality, random pages pulled from disk
Write amplification — more I/O per insert
Index bloat — fragmented B-tree wastes space

Solution: Use time-sorted IDs (ULID, UUID v7, Snowflake) that insert sequentially.

Unique ID Generation in Distributed Systems

Why Auto-Increment Doesn't Work

Approaches Compared

Twitter Snowflake (The Gold Standard)

Bit Layout (64 bits total)

Implementation

Properties

ULID — Universally Unique Lexicographically Sortable Identifier

UUID — When to (Still) Use It

Machine ID Assignment

Clock Skew Problem

Choosing the Right Strategy

Interview Questions

5-Minute System Design — Weekly