UUIDs as Primary Keys

source

UUIDs (Universally Unique Identifiers) are 128-bit identifiers designed to be globally unique without requiring central coordination. They offer compelling advantages for distributed systems and client-server architectures, but their implementation details profoundly affect database performance.

The UUID Appeal

UUIDs solve a fundamental problem in distributed systems: how to generate unique identifiers without coordination. Unlike auto-incrementing integers that require database-managed sequences, UUIDs can be generated client-side before any server interaction.

This enables optimistic application design—clients can create entities with permanent identifiers immediately, then persist them asynchronously. No waiting for database round-trips to get an ID. No temporary client-side IDs that must be reconciled with server-assigned ones. The identifier exists from the moment of creation.

For systems respecting the fallacies of distributed computing, this matters. Network latency isn’t zero. The network isn’t always reliable. UUIDs let applications proceed despite these realities.

The collision probability is astronomically low—generating a billion UUIDs per second for 100 years yields roughly a 50% chance of a single collision. For practical purposes, UUIDs are unique.

The UUIDv4 Problem: Random Chaos

UUIDv4 generates identifiers from 122 bits of randomness (remaining bits encode version and variant). This randomness—the source of global uniqueness—creates severe database performance issues.

The indexing problem: Database indexes, particularly B-trees, rely on ordered data. Each new UUIDv4 insertion occurs at a pseudo-random position in the index tree. This forces constant rebalancing, page splits, and cache invalidation.

Imagine inserting items into a sorted array at random positions. Every insertion requires shifting existing elements, fragmenting memory, defeating cache locality. Database indexes experience exactly this pathology with random UUIDs.

The performance impact compounds at scale:

Index size grows: Poor space utilization from fragmentation
Write throughput drops: Constant tree restructuring overhead
Cache efficiency degrades: Random access patterns defeat CPU caches

This tension embodies a fundamental tradeoff: the same randomness that enables distributed uniqueness destroys sequential ordering that databases need for efficiency. It’s a direct conflict between distributed system requirements and database performance characteristics.

UUIDv7: Time-Ordered Uniqueness

UUIDv7 resolves the indexing problem through temporal ordering while preserving distributed generation:

Structure:

First 48 bits: Unix timestamp in milliseconds
Remaining bits: Version identifier, variant, and randomness

UUIDs generated in temporal proximity sort sequentially. Database insertions follow append-mostly patterns rather than random insertion. The index tree grows at its edges instead of reorganizing its interior.

Performance characteristics compared to UUIDv4:

Metric	UUIDv4	UUIDv7	Improvement
Index Size	389MB	301MB	23% smaller
Write Throughput	183K/s	266K/s	45% faster
Insert Time	54.62s	37.53s	31% faster

These aren’t marginal improvements—they’re the difference between random and sequential insertion patterns. UUIDv7 approaches the efficiency of auto-incrementing integers while maintaining distributed generation capability.

The sequential nature also improves read performance. Range queries by creation time scan contiguous index regions. Related entities created together cluster spatially in the index.

The Tradeoffs

UUIDv7’s temporal ordering introduces new considerations:

Information disclosure: The embedded timestamp reveals entity creation times. For public-facing identifiers, this might expose business metrics (user signup rates, order volumes). Whether this matters depends on your security model and what identifiers are exposed.

Reduced randomness: Multiple UUIDs generated within the same millisecond share timestamp bits, reducing randomness. For typical workloads this doesn’t threaten uniqueness—you’d need thousands of generations per millisecond from the same source to risk collision. But high-throughput systems need awareness of this constraint.

Time dependency: UUIDv7 assumes reasonably synchronized clocks across systems. Clock skew can produce out-of-order UUIDs if different servers generate identifiers with different time references. This matters less for single-writer systems, but distributed multi-writer scenarios need consideration.

These tradeoffs are acceptable for most applications. The performance gains from sequential insertion far outweigh the minor information leakage of creation timestamps. But they’re worth understanding—no design is free of tradeoffs.

Implementation Considerations

When UUIDs make sense:

Distributed data generation across multiple services
Client-side entity creation before persistence
Data synchronization across systems
Avoiding centralized ID generation bottlenecks
Public APIs where sequential IDs expose business metrics

When auto-increment remains better:

Single-database systems with no distribution needs
Extreme write performance requirements (integers still faster)
Space-constrained scenarios (UUIDs are 128 bits vs 32/64 for integers)
Sequential ordering matters semantically, not just for performance

Database support: Most modern databases support UUID types with optimized storage. PostgreSQL has native uuid type. MySQL 8.0+ handles UUIDs efficiently. Verify your database’s UUID implementation before committing—some older systems store them as strings, destroying performance.

Index strategy: Even with UUIDv7, consider clustered vs non-clustered indexes. A clustered index on UUID primary key means all data pages reorganize with the index. Sometimes better to cluster on another sequential field (like insertion timestamp) and use UUID as non-clustered unique index.

UUIDv7’s time-ordered approach mirrors other sequential-yet-distributed identifier strategies:

Snowflake IDs (Twitter): 64-bit IDs with timestamp + worker ID + sequence. More compact than UUIDs but requires worker coordination.

ULID (Universally Unique Lexicographically Sortable ID): 128-bit like UUID but with different encoding. Base32 encoded for URL safety and human readability.

KSUID (K-Sortable Unique IDentifier): 160-bit with second-precision timestamp. Trades size for better ordering guarantees.

All share the insight: hybrid approaches combining time ordering with randomness can satisfy both distributed uniqueness and database efficiency. The specific tradeoffs differ (size, precision, coordination), but the pattern recurs.

The evolution from UUIDv4 to UUIDv7 demonstrates designing abstractions that align with underlying system constraints rather than fighting them. Distributed uniqueness is valuable, but not at the cost of defeating every database optimization.

Practical Guidance

For new systems requiring distributed identifier generation: default to UUIDv7. The performance characteristics approach auto-increment while preserving distributed generation. The timestamp disclosure is rarely a security issue in practice.

For existing UUIDv4 systems experiencing performance problems: consider migration paths. Dual-column strategies (keep old UUID for compatibility, add new UUIDv7 for indexing) can bridge the transition. Or accept the performance cost if write volume is manageable.

For read-heavy workloads or small datasets: UUIDv4’s problems may not manifest. Index size and write performance matter most at scale. Measure before optimizing.

The deeper lesson: identifier design isn’t trivial. What seems like an implementation detail—how you generate unique numbers—cascades through your entire system’s performance and operational characteristics. Choose thoughtfully, understanding the tradeoffs.

Gradual Notes

Recent Writing

Revisited

Space is Not Barrenness

Study the Canon

Recent Notes

Caching Context

Reentrant Code

Zeigarnik Effect

Migrating from Resque to Sidekiq

UUIDs as Primary Keys

The UUID Appeal

The UUIDv4 Problem: Random Chaos

UUIDv7: Time-Ordered Uniqueness

The Tradeoffs

Implementation Considerations

Practical Guidance

Recent Writing

Revisited

Space is Not Barrenness

Study the Canon

Recent Notes

Caching Context

Reentrant Code

Zeigarnik Effect

Migrating from Resque to Sidekiq

Kubernetes Batch Jobs

Graph View

Table of Contents

Recent Writing

Recent Notes

UUIDs as Primary Keys

The UUID Appeal

The UUIDv4 Problem: Random Chaos

UUIDv7: Time-Ordered Uniqueness

The Tradeoffs

Implementation Considerations

Related Patterns

Practical Guidance

Recent Writing

Recent Notes

Graph View

Table of Contents