Within the system design interview, you are often going to have to do back-of-the-envelope design that requires you to estimate the system capacity or the performance requirements of an entire complex system.
Power of Two
Although data volume can be enormous when dealing with distributed systems, the calculations involved are simple. A byte is a sequence of 8 bits, and an ASCII character uses one byte of memory (8 bits).
Latency Numbers a Developer Should Know
| Operation name | Time |
|---|---|
| L1 cache reference | 0.5 ns |
| Branch mispredict | 5 ns |
| L2 cache reference | 7 ns |
| Mutex lock/unlock | 100 ns |
| Main memory reference | 100 ns |
| Compress 1K bytes with Zippy | 10,000 ns = 10 µs |
| Send 2K bytes over 1 Gbps network | 20,000 ns = 20 µs |
| Read 1 MB sequentially from memory | 250,000 ns = 250 µs |
| Round trip within the same datacenter | 500,000 ns = 500 µs |
| Disk seek | 10,000,000 ns = 10 ms |
| Read 1 MB sequentially from the network | 10,000,000 ns = 10 ms |
| Read 1 MB sequentially from disk | 30,000,000 ns = 30 ms |
| Send packet CA (California) →Netherlands→CA | 150,000,000 ns = 150 ms |
By understanding this data, we understand that
- memory is fast but the disk is slow
- avoid disk seeks if possible
- simple compression algorithms are fast
- compress data before sending it over the internet if possible
- data centers are usually in different regions and it takes time to send data between them
Availability Numbers
High availability refers to the ability of the system to maintain continuous operation for a desirably long period of time. Usually, this is measured as a percentage.
An Service Level Agreement (SLA) is a commonly used term from service providers that basically is an agreement that you the customer and them the provider agree upon to guarantee an amount of uptime.
Example Estimation Problem
• 300 million monthly active users. • 50% of users use Twitter daily. • Users post 2 tweets per day on average. • 10% of tweets contain media. • Data is stored for 5 years.
Estimate Key Metrics for the System:
- Daily Active Users
- QPS of Tweets
- Peak QPS
- Media Storage + Bandwidth Transfer Rate
DAU = 300,000,000 / 2 = ~150,000,000 QPS = ((300,000,000 / 2) * 2) / 24 hrs / 3600 seconds ~ 3500 QPS Peak QPS = QPS * 2 Media Storage / Day = ((DAU * 2)/10) * 1MB = 30TB / day 5-Year Media Storage = Media Storage / Day * 365 * 5 ~ 55PB
Commonly Asked Estimation Metrics
| Metric | What it is | Typical units | How to estimate (back-of-envelope) | Why it matters |
|---|---|---|---|---|
| MAU / DAU | Monthly / daily active users (unique users) | users | Given or assume; DAU often 10–30% of MAU (varies by product) | Baseline demand; everything scales from active users |
| Concurrent users | Users active at the same time | users | concurrency ≈ DAU * (avg session length / seconds per day) | Drives real-time capacity and peak load assumptions |
| Actions per user per day | How many “things” a user does daily (posts, refreshes, searches) | actions/user/day | Product-specific; pick a plausible number and state it | Converts users → total requests/events |
| Requests per action | How many backend requests one action triggers | req/action | Often 1–10+ depending on fanout (feed, timeline) | Helps avoid undercounting real backend load |
| QPS (avg) | Average queries/requests per second | req/s | avg QPS = (DAU * actions/user/day * req/action) / 86,400 | Core throughput input for app, cache, DB sizing |
| Read QPS / Write QPS | Split of reads vs writes | req/s | Apply read:write ratio (e.g., 10:1, 100:1 for feeds) | Reads often dominate; writes drive storage + consistency needs |
| Peak QPS | Highest expected QPS during spikes | req/s | peak QPS = avg QPS * peak factor (commonly 3–10x) | Determines “must not fall over” capacity |
| Peak factor | Ratio of peak to average load | multiplier | If unknown, choose 3x (steady) to 10x (spiky) and justify | Makes your design resilient to diurnal + bursty traffic |
| Tail latency (p95/p99) | Response time at 95th/99th percentile | ms | Choose targets per endpoint (e.g., p99 < 200–500ms) | Interviewers care about p99; drives caching + async + indexing |
| SLA / SLO | Availability/latency reliability target | % / ms | e.g., 99.9% monthly, p99 < 300ms | Determines redundancy, multi-AZ, retries, queues, failover |
| Error rate budget | Allowed failure rate | % | Derived from SLO (e.g., 0.1% for 99.9%) | Guides rollout strategy, monitoring, graceful degradation |
| Request size | Bytes in request (headers + body) | bytes | Assume typical payload (e.g., 1–5KB) | Affects bandwidth + load balancer + gateway throughput |
| Response size | Bytes returned per request | bytes | Often larger than request; feeds can be 10–200KB | Dominates bandwidth and egress cost |
| Bandwidth / Throughput | Network data per second | bytes/s, MB/s, Gbps | bandwidth ≈ QPS * avg response size (plus request size) | Sizing network, load balancers, CDN, cost |
| Egress | Data leaving your system (often billed) | GB/month | egress ≈ bandwidth integrated over time | Big cost lever; pushes you to cache/CDN/compress |
| Events/sec (streaming) | Produced events for logs/telemetry/activity | events/s | events/s ≈ actions/s * events per action | Sizing Kafka/NATS/PubSub, consumers, storage |
| Storage per record | Size of one stored item (row/object) | bytes/record | Estimate schema fields + overhead + indexes | Foundation for total storage and index sizing |
| Write volume per day | New data written each day | GB/day | GB/day = writes/day * bytes/write (include metadata) | Drives retention, partitioning, compaction, backups |
| Total storage | Stored data over retention window | GB/TB/PB | total = (GB/day * retention days) * replication factor | Determines DB choice, sharding, cost |
| Retention period | How long data is kept | days/months/years | Product/legal requirement (e.g., 30d logs, forever posts) | Massive impact on storage and compliance |
| Replication factor | Copies stored for durability | multiplier | Common: 2–3x across AZs; more for multi-region | Multiplies storage and write cost; improves availability |
| Index size / overhead | Extra storage for query indexes | % or GB | rule of thumb: 20–200% of raw data depending on indexes | Indexes often dominate DB storage + write amplification |
| Cache hit rate | Fraction of requests served from cache | % | Assume 70–99% depending on workload + TTL | Directly reduces DB QPS and improves latency |
| Cache miss rate | Fraction that falls through to DB | % | miss = 1 - hit | Determines DB load under caching |
| Cache size | Memory needed to hold hot set | GB | hot set ≈ (hot keys * bytes/value) + overhead | Prevents evictions; impacts cost and latency |
| Cache TTL | How long cached entries live | seconds/minutes | Set based on freshness requirements | Balances staleness vs load reduction |
| CPU / Memory per request | Compute cost per request | ms CPU, MB | Use benchmarks or rough assumptions | Converts QPS → server count and instance sizing |
| Server capacity | How many QPS one server can handle | req/s/server | capacity from benchmarks; otherwise assume and state | Critical for “number of servers” estimate |
| Number of app servers | Count of servers to meet peak | servers | servers ≈ peak QPS / capacity per server * headroom | Shows you can translate load → fleet size |
| Headroom | Extra capacity buffer | % | Commonly 20–50% (autoscaling, failures, deploys) | Prevents collapse during spikes, deploys, AZ loss |
| DB IOPS | Disk operations per second needed | ops/s | Depends on reads/writes and access pattern; cite as risk | Often the hidden bottleneck for DB performance |
| Partition/shard count | How many splits of data | shards | shards ≈ total load/storage / per-shard limits | Enables horizontal scaling; impacts joins/transactions |
| Fanout | One write causes many reads/writes | multiplier | e.g., post → push to N followers (fanout-on-write) | Can explode write volume; changes architecture choice |
| Queue throughput | Jobs processed per second | jobs/s | jobs/s ≈ produced events/s (steady-state) | Sizes workers; avoids lag/backlog growth |
| Backlog / Lag | How far behind async processing is | seconds/minutes | lag grows if consume rate < produce rate | Reliability + user experience; triggers scaling |
Linked Map of Contexts