Within the system design interview, you are often going to have to do back-of-the-envelope design that requires you to estimate the system capacity or the performance requirements of an entire complex system.

Power of Two

Although data volume can be enormous when dealing with distributed systems, the calculations involved are simple. A byte is a sequence of 8 bits, and an ASCII character uses one byte of memory (8 bits).

Latency Numbers a Developer Should Know

Operation nameTime
L1 cache reference0.5 ns
Branch mispredict5 ns
L2 cache reference7 ns
Mutex lock/unlock100 ns
Main memory reference100 ns
Compress 1K bytes with Zippy10,000 ns = 10 µs
Send 2K bytes over 1 Gbps network20,000 ns = 20 µs
Read 1 MB sequentially from memory250,000 ns = 250 µs
Round trip within the same datacenter500,000 ns = 500 µs
Disk seek10,000,000 ns = 10 ms
Read 1 MB sequentially from the network10,000,000 ns = 10 ms
Read 1 MB sequentially from disk30,000,000 ns = 30 ms
Send packet CA (California) NetherlandsCA150,000,000 ns = 150 ms

By understanding this data, we understand that

  • memory is fast but the disk is slow
  • avoid disk seeks if possible
  • simple compression algorithms are fast
  • compress data before sending it over the internet if possible
  • data centers are usually in different regions and it takes time to send data between them

Availability Numbers

High availability refers to the ability of the system to maintain continuous operation for a desirably long period of time. Usually, this is measured as a percentage.

An Service Level Agreement (SLA) is a commonly used term from service providers that basically is an agreement that you the customer and them the provider agree upon to guarantee an amount of uptime.

Example Estimation Problem

• 300 million monthly active users. • 50% of users use Twitter daily. • Users post 2 tweets per day on average. • 10% of tweets contain media. • Data is stored for 5 years.

Estimate Key Metrics for the System:

  • Daily Active Users
  • QPS of Tweets
  • Peak QPS
  • Media Storage + Bandwidth Transfer Rate

DAU = 300,000,000 / 2 = ~150,000,000 QPS = ((300,000,000 / 2) * 2) / 24 hrs / 3600 seconds ~ 3500 QPS Peak QPS = QPS * 2 Media Storage / Day = ((DAU * 2)/10) * 1MB = 30TB / day 5-Year Media Storage = Media Storage / Day * 365 * 5 ~ 55PB

Commonly Asked Estimation Metrics

MetricWhat it isTypical unitsHow to estimate (back-of-envelope)Why it matters
MAU / DAUMonthly / daily active users (unique users)usersGiven or assume; DAU often 10–30% of MAU (varies by product)Baseline demand; everything scales from active users
Concurrent usersUsers active at the same timeusersconcurrency ≈ DAU * (avg session length / seconds per day)Drives real-time capacity and peak load assumptions
Actions per user per dayHow many “things” a user does daily (posts, refreshes, searches)actions/user/dayProduct-specific; pick a plausible number and state itConverts users → total requests/events
Requests per actionHow many backend requests one action triggersreq/actionOften 1–10+ depending on fanout (feed, timeline)Helps avoid undercounting real backend load
QPS (avg)Average queries/requests per secondreq/savg QPS = (DAU * actions/user/day * req/action) / 86,400Core throughput input for app, cache, DB sizing
Read QPS / Write QPSSplit of reads vs writesreq/sApply read:write ratio (e.g., 10:1, 100:1 for feeds)Reads often dominate; writes drive storage + consistency needs
Peak QPSHighest expected QPS during spikesreq/speak QPS = avg QPS * peak factor (commonly 3–10x)Determines “must not fall over” capacity
Peak factorRatio of peak to average loadmultiplierIf unknown, choose 3x (steady) to 10x (spiky) and justifyMakes your design resilient to diurnal + bursty traffic
Tail latency (p95/p99)Response time at 95th/99th percentilemsChoose targets per endpoint (e.g., p99 < 200–500ms)Interviewers care about p99; drives caching + async + indexing
SLA / SLOAvailability/latency reliability target% / mse.g., 99.9% monthly, p99 < 300msDetermines redundancy, multi-AZ, retries, queues, failover
Error rate budgetAllowed failure rate%Derived from SLO (e.g., 0.1% for 99.9%)Guides rollout strategy, monitoring, graceful degradation
Request sizeBytes in request (headers + body)bytesAssume typical payload (e.g., 1–5KB)Affects bandwidth + load balancer + gateway throughput
Response sizeBytes returned per requestbytesOften larger than request; feeds can be 10–200KBDominates bandwidth and egress cost
Bandwidth / ThroughputNetwork data per secondbytes/s, MB/s, Gbpsbandwidth ≈ QPS * avg response size (plus request size)Sizing network, load balancers, CDN, cost
EgressData leaving your system (often billed)GB/monthegress ≈ bandwidth integrated over timeBig cost lever; pushes you to cache/CDN/compress
Events/sec (streaming)Produced events for logs/telemetry/activityevents/sevents/s ≈ actions/s * events per actionSizing Kafka/NATS/PubSub, consumers, storage
Storage per recordSize of one stored item (row/object)bytes/recordEstimate schema fields + overhead + indexesFoundation for total storage and index sizing
Write volume per dayNew data written each dayGB/dayGB/day = writes/day * bytes/write (include metadata)Drives retention, partitioning, compaction, backups
Total storageStored data over retention windowGB/TB/PBtotal = (GB/day * retention days) * replication factorDetermines DB choice, sharding, cost
Retention periodHow long data is keptdays/months/yearsProduct/legal requirement (e.g., 30d logs, forever posts)Massive impact on storage and compliance
Replication factorCopies stored for durabilitymultiplierCommon: 2–3x across AZs; more for multi-regionMultiplies storage and write cost; improves availability
Index size / overheadExtra storage for query indexes% or GBrule of thumb: 20–200% of raw data depending on indexesIndexes often dominate DB storage + write amplification
Cache hit rateFraction of requests served from cache%Assume 70–99% depending on workload + TTLDirectly reduces DB QPS and improves latency
Cache miss rateFraction that falls through to DB%miss = 1 - hitDetermines DB load under caching
Cache sizeMemory needed to hold hot setGBhot set ≈ (hot keys * bytes/value) + overheadPrevents evictions; impacts cost and latency
Cache TTLHow long cached entries liveseconds/minutesSet based on freshness requirementsBalances staleness vs load reduction
CPU / Memory per requestCompute cost per requestms CPU, MBUse benchmarks or rough assumptionsConverts QPS → server count and instance sizing
Server capacityHow many QPS one server can handlereq/s/servercapacity from benchmarks; otherwise assume and stateCritical for “number of servers” estimate
Number of app serversCount of servers to meet peakserversservers ≈ peak QPS / capacity per server * headroomShows you can translate load → fleet size
HeadroomExtra capacity buffer%Commonly 20–50% (autoscaling, failures, deploys)Prevents collapse during spikes, deploys, AZ loss
DB IOPSDisk operations per second neededops/sDepends on reads/writes and access pattern; cite as riskOften the hidden bottleneck for DB performance
Partition/shard countHow many splits of datashardsshards ≈ total load/storage / per-shard limitsEnables horizontal scaling; impacts joins/transactions
FanoutOne write causes many reads/writesmultipliere.g., post → push to N followers (fanout-on-write)Can explode write volume; changes architecture choice
Queue throughputJobs processed per secondjobs/sjobs/s ≈ produced events/s (steady-state)Sizes workers; avoids lag/backlog growth
Backlog / LagHow far behind async processing isseconds/minuteslag grows if consume rate < produce rateReliability + user experience; triggers scaling

Linked Map of Contexts