System Design Interview

Within the system design interview, you are often going to have to do back-of-the-envelope design that requires you to estimate the system capacity or the performance requirements of an entire complex system.

Power of Two

Although data volume can be enormous when dealing with distributed systems, the calculations involved are simple. A byte is a sequence of 8 bits, and an ASCII character uses one byte of memory (8 bits).

Latency Numbers a Developer Should Know

Operation name	Time
L1 cache reference	0.5 ns
Branch mispredict	5 ns
L2 cache reference	7 ns
Mutex lock/unlock	100 ns
Main memory reference	100 ns
Compress 1K bytes with Zippy	10,000 ns = 10 µs
Send 2K bytes over 1 Gbps network	20,000 ns = 20 µs
Read 1 MB sequentially from memory	250,000 ns = 250 µs
Round trip within the same datacenter	500,000 ns = 500 µs
Disk seek	10,000,000 ns = 10 ms
Read 1 MB sequentially from the network	10,000,000 ns = 10 ms
Read 1 MB sequentially from disk	30,000,000 ns = 30 ms
Send packet CA (California) →Netherlands→CA	150,000,000 ns = 150 ms

By understanding this data, we understand that

memory is fast but the disk is slow
avoid disk seeks if possible
simple compression algorithms are fast
compress data before sending it over the internet if possible
data centers are usually in different regions and it takes time to send data between them

Availability Numbers

High availability refers to the ability of the system to maintain continuous operation for a desirably long period of time. Usually, this is measured as a percentage.

An Service Level Agreement (SLA) is a commonly used term from service providers that basically is an agreement that you the customer and them the provider agree upon to guarantee an amount of uptime.

Example Estimation Problem

• 300 million monthly active users. • 50% of users use Twitter daily. • Users post 2 tweets per day on average. • 10% of tweets contain media. • Data is stored for 5 years.

Estimate Key Metrics for the System:

Daily Active Users
QPS of Tweets
Peak QPS
Media Storage + Bandwidth Transfer Rate

DAU = 300,000,000 / 2 = ~150,000,000 QPS = ((300,000,000 / 2) * 2) / 24 hrs / 3600 seconds ~ 3500 QPS Peak QPS = QPS * 2 Media Storage / Day = ((DAU * 2)/10) * 1MB = 30TB / day 5-Year Media Storage = Media Storage / Day * 365 * 5 ~ 55PB

Commonly Asked Estimation Metrics

Metric	What it is	Typical units	How to estimate (back-of-envelope)	Why it matters
MAU / DAU	Monthly / daily active users (unique users)	users	Given or assume; DAU often 10–30% of MAU (varies by product)	Baseline demand; everything scales from active users
Concurrent users	Users active at the same time	users	concurrency ≈ DAU * (avg session length / seconds per day)	Drives real-time capacity and peak load assumptions
Actions per user per day	How many “things” a user does daily (posts, refreshes, searches)	actions/user/day	Product-specific; pick a plausible number and state it	Converts users → total requests/events
Requests per action	How many backend requests one action triggers	req/action	Often 1–10+ depending on fanout (feed, timeline)	Helps avoid undercounting real backend load
QPS (avg)	Average queries/requests per second	req/s	avg QPS = (DAU * actions/user/day * req/action) / 86,400	Core throughput input for app, cache, DB sizing
Read QPS / Write QPS	Split of reads vs writes	req/s	Apply read:write ratio (e.g., 10:1, 100:1 for feeds)	Reads often dominate; writes drive storage + consistency needs
Peak QPS	Highest expected QPS during spikes	req/s	peak QPS = avg QPS * peak factor (commonly 3–10x)	Determines “must not fall over” capacity
Peak factor	Ratio of peak to average load	multiplier	If unknown, choose 3x (steady) to 10x (spiky) and justify	Makes your design resilient to diurnal + bursty traffic
Tail latency (p95/p99)	Response time at 95th/99th percentile	ms	Choose targets per endpoint (e.g., p99 < 200–500ms)	Interviewers care about p99; drives caching + async + indexing
SLA / SLO	Availability/latency reliability target	% / ms	e.g., 99.9% monthly, p99 < 300ms	Determines redundancy, multi-AZ, retries, queues, failover
Error rate budget	Allowed failure rate	%	Derived from SLO (e.g., 0.1% for 99.9%)	Guides rollout strategy, monitoring, graceful degradation
Request size	Bytes in request (headers + body)	bytes	Assume typical payload (e.g., 1–5KB)	Affects bandwidth + load balancer + gateway throughput
Response size	Bytes returned per request	bytes	Often larger than request; feeds can be 10–200KB	Dominates bandwidth and egress cost
Bandwidth / Throughput	Network data per second	bytes/s, MB/s, Gbps	bandwidth ≈ QPS * avg response size (plus request size)	Sizing network, load balancers, CDN, cost
Egress	Data leaving your system (often billed)	GB/month	egress ≈ bandwidth integrated over time	Big cost lever; pushes you to cache/CDN/compress
Events/sec (streaming)	Produced events for logs/telemetry/activity	events/s	events/s ≈ actions/s * events per action	Sizing Kafka/NATS/PubSub, consumers, storage
Storage per record	Size of one stored item (row/object)	bytes/record	Estimate schema fields + overhead + indexes	Foundation for total storage and index sizing
Write volume per day	New data written each day	GB/day	GB/day = writes/day * bytes/write (include metadata)	Drives retention, partitioning, compaction, backups
Total storage	Stored data over retention window	GB/TB/PB	total = (GB/day * retention days) * replication factor	Determines DB choice, sharding, cost
Retention period	How long data is kept	days/months/years	Product/legal requirement (e.g., 30d logs, forever posts)	Massive impact on storage and compliance
Replication factor	Copies stored for durability	multiplier	Common: 2–3x across AZs; more for multi-region	Multiplies storage and write cost; improves availability
Index size / overhead	Extra storage for query indexes	% or GB	rule of thumb: 20–200% of raw data depending on indexes	Indexes often dominate DB storage + write amplification
Cache hit rate	Fraction of requests served from cache	%	Assume 70–99% depending on workload + TTL	Directly reduces DB QPS and improves latency
Cache miss rate	Fraction that falls through to DB	%	miss = 1 - hit	Determines DB load under caching
Cache size	Memory needed to hold hot set	GB	hot set ≈ (hot keys * bytes/value) + overhead	Prevents evictions; impacts cost and latency
Cache TTL	How long cached entries live	seconds/minutes	Set based on freshness requirements	Balances staleness vs load reduction
CPU / Memory per request	Compute cost per request	ms CPU, MB	Use benchmarks or rough assumptions	Converts QPS → server count and instance sizing
Server capacity	How many QPS one server can handle	req/s/server	capacity from benchmarks; otherwise assume and state	Critical for “number of servers” estimate
Number of app servers	Count of servers to meet peak	servers	servers ≈ peak QPS / capacity per server * headroom	Shows you can translate load → fleet size
Headroom	Extra capacity buffer	%	Commonly 20–50% (autoscaling, failures, deploys)	Prevents collapse during spikes, deploys, AZ loss
DB IOPS	Disk operations per second needed	ops/s	Depends on reads/writes and access pattern; cite as risk	Often the hidden bottleneck for DB performance
Partition/shard count	How many splits of data	shards	shards ≈ total load/storage / per-shard limits	Enables horizontal scaling; impacts joins/transactions
Fanout	One write causes many reads/writes	multiplier	e.g., post → push to N followers (fanout-on-write)	Can explode write volume; changes architecture choice
Queue throughput	Jobs processed per second	jobs/s	jobs/s ≈ produced events/s (steady-state)	Sizes workers; avoids lag/backlog growth
Backlog / Lag	How far behind async processing is	seconds/minutes	lag grows if consume rate < produce rate	Reliability + user experience; triggers scaling

Linked Map of Contexts

System Design, Computer Science

Pensieve

Recent Notes

Explorer

System Design Interview - Chapter 2

Table of Contents

Power of Two

Latency Numbers a Developer Should Know

Availability Numbers

Example Estimation Problem

Commonly Asked Estimation Metrics

Graph View

Table of Contents