Database Selection Cheat Sheet for System Design

Quick Decision Tree

Need sub-millisecond reads? → Key-Value (Redis)
Need ACID transactions? → Relational (PostgreSQL)
Need flexible schema? → Document (MongoDB)
Need relationship traversal? → Graph (Neo4j)
Need massive write throughput? → Wide Column (Cassandra)
Need full-text search? → Search Engine (Elasticsearch)
Need multiple paradigms? → Multimodel (ArangoDB)

Trigger Keywords → Database Type

When You Hear…Consider…
“caching”, “session store”, “real-time leaderboard”Redis / Key-Value
”financial”, “transactions”, “consistency”, “ACID”PostgreSQL / RDBMS
”flexible schema”, “varying attributes”, “JSON”MongoDB / Document
”connections”, “recommendations”, “friends-of-friends”Neo4j / Graph
”time-series”, “IoT”, “billions of writes”Cassandra / Wide Column
”search”, “typo tolerance”, “ranking”, “full-text”Elasticsearch / Search

Database Quick Reference

Key-Value (Redis, Memcached)

  • Use When: Caching, session storage, rate limiting, pub/sub, real-time leaderboards
  • Avoid When: Complex queries needed, data relationships required, persistence is critical
  • Real Example: Instagram caches user feeds—serves billions of requests in <100ms
ProsCons
Sub-millisecond latencyMemory-bound
Simple APINo complex queries
High throughputVolatile without config

Wide Column (Cassandra, HBase)

  • Use When: Time-series data, event logging, high write throughput, multi-region replication
  • Avoid When: Need joins, need strong consistency, highly normalized data
  • Real Example: Netflix stores viewing history—billions of writes daily, no single point of failure
ProsCons
Horizontal scaleNo joins
High write throughputEventual consistency
Multi-region replicationDenormalized data required

Document (MongoDB, CouchDB)

  • Use When: Varying document structures, content management, product catalogs, user profiles
  • Avoid When: Highly relational data, frequent cross-document updates, need joins
  • Real Example: Uber stores trips—each ride type (UberX, Eats, Pool) has different attributes
ProsCons
Flexible schemaData duplication
Fast readsComplex writes for updates
Natural JSON fitNo native joins

Relational (PostgreSQL, MySQL, CockroachDB)

  • Use When: Financial transactions, e-commerce orders, booking systems, any ACID requirement
  • Avoid When: Massive horizontal scale needed, schema changes constantly, unstructured data
  • Real Example: Stripe processes payments—every penny must be accounted for with atomic transactions
ProsCons
ACID complianceHarder to scale horizontally
Data integrityRigid schema
Complex queriesSchema migrations painful

Graph (Neo4j, Amazon Neptune)

  • Use When: Social networks, fraud detection, recommendations, knowledge graphs, network topology
  • Avoid When: Simple tabular data, few relationships, team unfamiliar with graph concepts
  • Real Example: LinkedIn powers “People You May Know”—traverses millions of connections in milliseconds
ProsCons
Fast relationship traversalSteeper learning curve
Eliminates expensive joinsSpecialized query language
Intuitive for connected dataOverkill for simple data

Search Engine (Elasticsearch, Solr)

  • Use When: Full-text search, log aggregation, typo tolerance, relevance ranking, autocomplete
  • Avoid When: Simple exact-match queries, tight budget, primary transactional store needed
  • Real Example: GitHub searches billions of lines of code with fuzzy matching and relevance scoring
ProsCons
Powerful text searchExpensive at scale
Fuzzy matchingHigh resource consumption
Relevance scoringNot a primary datastore

Multimodel (ArangoDB, FaunaDB)

  • Use When: Need documents + graphs together, want to reduce infra complexity, evolving requirements
  • Avoid When: Need peak performance for one paradigm, single model suffices
  • Real Example: E-commerce with product catalog (documents) + recommendations (graph) in one DB
ProsCons
Multiple paradigmsJack of all trades
Reduced complexityLower specialized performance
Single query languageSteeper learning curve

Common System Design Scenarios

SystemPrimary DBSupporting DBWhy
Social Media FeedPostgreSQLRedis (cache)ACID for posts, cache for feed
E-commercePostgreSQLElasticsearchTransactions + product search
Chat ApplicationCassandraRedis (presence)High write volume, real-time
Recommendation EngineNeo4jRedis (cache)Relationship traversal
Analytics PlatformCassandraElasticsearchTime-series + search/aggregation
Content PlatformMongoDBElasticsearchFlexible content + search

Interview Tips

  1. Always justify your choice — “I’d use X because we need Y”
  2. Mention trade-offs — No database is perfect
  3. Consider hybrid approaches — Most systems use 2-3 databases
  4. Think about scale — “At 1M users… at 100M users…”
  5. Ask clarifying questions — Consistency requirements? Read vs write heavy?

The Golden Rule Start with PostgreSQL as your default. Only deviate when you have a specific requirement it can't handle well (massive scale, graph traversal, flexible schema, caching layer).


Linked Map of Contexts