Database Selection Cheat Sheet for System Design
Quick Decision Tree
Need sub-millisecond reads? → Key-Value (Redis)
Need ACID transactions? → Relational (PostgreSQL)
Need flexible schema? → Document (MongoDB)
Need relationship traversal? → Graph (Neo4j)
Need massive write throughput? → Wide Column (Cassandra)
Need full-text search? → Search Engine (Elasticsearch)
Need multiple paradigms? → Multimodel (ArangoDB)
Trigger Keywords → Database Type
| When You Hear… | Consider… |
|---|---|
| “caching”, “session store”, “real-time leaderboard” | Redis / Key-Value |
| ”financial”, “transactions”, “consistency”, “ACID” | PostgreSQL / RDBMS |
| ”flexible schema”, “varying attributes”, “JSON” | MongoDB / Document |
| ”connections”, “recommendations”, “friends-of-friends” | Neo4j / Graph |
| ”time-series”, “IoT”, “billions of writes” | Cassandra / Wide Column |
| ”search”, “typo tolerance”, “ranking”, “full-text” | Elasticsearch / Search |
Database Quick Reference
Key-Value (Redis, Memcached)
- Use When: Caching, session storage, rate limiting, pub/sub, real-time leaderboards
- Avoid When: Complex queries needed, data relationships required, persistence is critical
- Real Example: Instagram caches user feeds—serves billions of requests in <100ms
| Pros | Cons |
|---|---|
| Sub-millisecond latency | Memory-bound |
| Simple API | No complex queries |
| High throughput | Volatile without config |
Wide Column (Cassandra, HBase)
- Use When: Time-series data, event logging, high write throughput, multi-region replication
- Avoid When: Need joins, need strong consistency, highly normalized data
- Real Example: Netflix stores viewing history—billions of writes daily, no single point of failure
| Pros | Cons |
|---|---|
| Horizontal scale | No joins |
| High write throughput | Eventual consistency |
| Multi-region replication | Denormalized data required |
Document (MongoDB, CouchDB)
- Use When: Varying document structures, content management, product catalogs, user profiles
- Avoid When: Highly relational data, frequent cross-document updates, need joins
- Real Example: Uber stores trips—each ride type (UberX, Eats, Pool) has different attributes
| Pros | Cons |
|---|---|
| Flexible schema | Data duplication |
| Fast reads | Complex writes for updates |
| Natural JSON fit | No native joins |
Relational (PostgreSQL, MySQL, CockroachDB)
- Use When: Financial transactions, e-commerce orders, booking systems, any ACID requirement
- Avoid When: Massive horizontal scale needed, schema changes constantly, unstructured data
- Real Example: Stripe processes payments—every penny must be accounted for with atomic transactions
| Pros | Cons |
|---|---|
| ACID compliance | Harder to scale horizontally |
| Data integrity | Rigid schema |
| Complex queries | Schema migrations painful |
Graph (Neo4j, Amazon Neptune)
- Use When: Social networks, fraud detection, recommendations, knowledge graphs, network topology
- Avoid When: Simple tabular data, few relationships, team unfamiliar with graph concepts
- Real Example: LinkedIn powers “People You May Know”—traverses millions of connections in milliseconds
| Pros | Cons |
|---|---|
| Fast relationship traversal | Steeper learning curve |
| Eliminates expensive joins | Specialized query language |
| Intuitive for connected data | Overkill for simple data |
Search Engine (Elasticsearch, Solr)
- Use When: Full-text search, log aggregation, typo tolerance, relevance ranking, autocomplete
- Avoid When: Simple exact-match queries, tight budget, primary transactional store needed
- Real Example: GitHub searches billions of lines of code with fuzzy matching and relevance scoring
| Pros | Cons |
|---|---|
| Powerful text search | Expensive at scale |
| Fuzzy matching | High resource consumption |
| Relevance scoring | Not a primary datastore |
Multimodel (ArangoDB, FaunaDB)
- Use When: Need documents + graphs together, want to reduce infra complexity, evolving requirements
- Avoid When: Need peak performance for one paradigm, single model suffices
- Real Example: E-commerce with product catalog (documents) + recommendations (graph) in one DB
| Pros | Cons |
|---|---|
| Multiple paradigms | Jack of all trades |
| Reduced complexity | Lower specialized performance |
| Single query language | Steeper learning curve |
Common System Design Scenarios
| System | Primary DB | Supporting DB | Why |
|---|---|---|---|
| Social Media Feed | PostgreSQL | Redis (cache) | ACID for posts, cache for feed |
| E-commerce | PostgreSQL | Elasticsearch | Transactions + product search |
| Chat Application | Cassandra | Redis (presence) | High write volume, real-time |
| Recommendation Engine | Neo4j | Redis (cache) | Relationship traversal |
| Analytics Platform | Cassandra | Elasticsearch | Time-series + search/aggregation |
| Content Platform | MongoDB | Elasticsearch | Flexible content + search |
Interview Tips
- Always justify your choice — “I’d use X because we need Y”
- Mention trade-offs — No database is perfect
- Consider hybrid approaches — Most systems use 2-3 databases
- Think about scale — “At 1M users… at 100M users…”
- Ask clarifying questions — Consistency requirements? Read vs write heavy?
The Golden Rule Start with PostgreSQL as your default. Only deviate when you have a specific requirement it can't handle well (massive scale, graph traversal, flexible schema, caching layer).
Linked Map of Contexts