Database Selection Cheat Sheet for System Design

Quick Decision Tree

Need sub-millisecond reads? → Key-Value (Redis)
Need ACID transactions? → Relational (PostgreSQL)
Need flexible schema? → Document (MongoDB)
Need relationship traversal? → Graph (Neo4j)
Need massive write throughput? → Wide Column (Cassandra)
Need full-text search? → Search Engine (Elasticsearch)
Need multiple paradigms? → Multimodel (ArangoDB)

Trigger Keywords → Database Type

When You Hear…	Consider…
“caching”, “session store”, “real-time leaderboard”	Redis / Key-Value
”financial”, “transactions”, “consistency”, “ACID”	PostgreSQL / RDBMS
”flexible schema”, “varying attributes”, “JSON”	MongoDB / Document
”connections”, “recommendations”, “friends-of-friends”	Neo4j / Graph
”time-series”, “IoT”, “billions of writes”	Cassandra / Wide Column
”search”, “typo tolerance”, “ranking”, “full-text”	Elasticsearch / Search

Database Quick Reference

Key-Value (Redis, Memcached)

Use When: Caching, session storage, rate limiting, pub/sub, real-time leaderboards
Avoid When: Complex queries needed, data relationships required, persistence is critical
Real Example: Instagram caches user feeds—serves billions of requests in <100ms

Pros	Cons
Sub-millisecond latency	Memory-bound
Simple API	No complex queries
High throughput	Volatile without config

Wide Column (Cassandra, HBase)

Use When: Time-series data, event logging, high write throughput, multi-region replication
Avoid When: Need joins, need strong consistency, highly normalized data
Real Example: Netflix stores viewing history—billions of writes daily, no single point of failure

Pros	Cons
Horizontal scale	No joins
High write throughput	Eventual consistency
Multi-region replication	Denormalized data required

Document (MongoDB, CouchDB)

Use When: Varying document structures, content management, product catalogs, user profiles
Avoid When: Highly relational data, frequent cross-document updates, need joins
Real Example: Uber stores trips—each ride type (UberX, Eats, Pool) has different attributes

Pros	Cons
Flexible schema	Data duplication
Fast reads	Complex writes for updates
Natural JSON fit	No native joins

Relational (PostgreSQL, MySQL, CockroachDB)

Use When: Financial transactions, e-commerce orders, booking systems, any ACID requirement
Avoid When: Massive horizontal scale needed, schema changes constantly, unstructured data
Real Example: Stripe processes payments—every penny must be accounted for with atomic transactions

Pros	Cons
ACID compliance	Harder to scale horizontally
Data integrity	Rigid schema
Complex queries	Schema migrations painful

Graph (Neo4j, Amazon Neptune)

Use When: Social networks, fraud detection, recommendations, knowledge graphs, network topology
Avoid When: Simple tabular data, few relationships, team unfamiliar with graph concepts
Real Example: LinkedIn powers “People You May Know”—traverses millions of connections in milliseconds

Pros	Cons
Fast relationship traversal	Steeper learning curve
Eliminates expensive joins	Specialized query language
Intuitive for connected data	Overkill for simple data

Search Engine (Elasticsearch, Solr)

Use When: Full-text search, log aggregation, typo tolerance, relevance ranking, autocomplete
Avoid When: Simple exact-match queries, tight budget, primary transactional store needed
Real Example: GitHub searches billions of lines of code with fuzzy matching and relevance scoring

Pros	Cons
Powerful text search	Expensive at scale
Fuzzy matching	High resource consumption
Relevance scoring	Not a primary datastore

Multimodel (ArangoDB, FaunaDB)

Use When: Need documents + graphs together, want to reduce infra complexity, evolving requirements
Avoid When: Need peak performance for one paradigm, single model suffices
Real Example: E-commerce with product catalog (documents) + recommendations (graph) in one DB

Pros	Cons
Multiple paradigms	Jack of all trades
Reduced complexity	Lower specialized performance
Single query language	Steeper learning curve

Common System Design Scenarios

System	Primary DB	Supporting DB	Why
Social Media Feed	PostgreSQL	Redis (cache)	ACID for posts, cache for feed
E-commerce	PostgreSQL	Elasticsearch	Transactions + product search
Chat Application	Cassandra	Redis (presence)	High write volume, real-time
Recommendation Engine	Neo4j	Redis (cache)	Relationship traversal
Analytics Platform	Cassandra	Elasticsearch	Time-series + search/aggregation
Content Platform	MongoDB	Elasticsearch	Flexible content + search

Interview Tips

Always justify your choice — “I’d use X because we need Y”
Mention trade-offs — No database is perfect
Consider hybrid approaches — Most systems use 2-3 databases
Think about scale — “At 1M users… at 100M users…”
Ask clarifying questions — Consistency requirements? Read vs write heavy?

The Golden Rule Start with PostgreSQL as your default. Only deviate when you have a specific requirement it can't handle well (massive scale, graph traversal, flexible schema, caching layer).

Linked Map of Contexts

Computer Science, System Design

Pensieve

Recent Notes

Explorer

What Database to Use When?

Table of Contents

Database Selection Cheat Sheet for System Design

Quick Decision Tree

Trigger Keywords → Database Type

Database Quick Reference

Key-Value (Redis, Memcached)

Wide Column (Cassandra, HBase)

Document (MongoDB, CouchDB)

Relational (PostgreSQL, MySQL, CockroachDB)

Graph (Neo4j, Amazon Neptune)

Search Engine (Elasticsearch, Solr)

Multimodel (ArangoDB, FaunaDB)

Common System Design Scenarios

Interview Tips

Backlinks

Graph View

Table of Contents