System Design Layers
NFR answers tell you what the system needs. Layers are what you draw. Start with the core 4, then add layers based on what the interviewer tells you.
The Core 4 — Always Draw These
Every system design diagram starts here. Add more only when NFRs demand it.
| Layer | What It Does | Example Tech |
|---|---|---|
| Client | Browser, mobile app, or external service making requests | Web, iOS, Android |
| Load Balancer | Distributes traffic across app servers. Single entry point. Handles health checks. | AWS ALB, Nginx |
| App Server | Business logic. Stateless so you can add more horizontally. | Node, Java, Go |
| Database | Persistent storage. Default to relational unless there's a clear reason not to. | PostgreSQL, MySQL |
[Client]
↓
[Load Balancer]
↓
[App Server(s)]
↓
[Database]
Optional Layers — Added by NFRs
Add only when an NFR demands it. Each one solves a specific problem but adds latency, cost, and operational complexity. Use the trigger column to decide during requirements gathering.
| Layer | What It Does | Add When... |
|---|---|---|
| CDN | Caches static assets at edge nodes close to users. | Global users, latency matters, lots of static assets |
| API Gateway | Handles auth, rate limiting, and routing before requests hit app servers. | Security requirements, multiple client types, public API |
| Cache (Redis) | Stores hot data in memory. Microsecond reads. Reduces DB load. | Read-heavy, latency < 100ms, same data read repeatedly |
| Message Queue | Decouples producers and consumers. Absorbs write bursts. | Write-heavy, async work, fault tolerance needed |
| Worker / Consumer | Processes jobs from the queue asynchronously. | Always paired with a message queue |
| DB Read Replicas | Copies of the primary DB that serve reads. Primary handles writes only. | Read-heavy, high availability |
| Object Storage | Stores files, images, video. Cheap, durable, infinitely scalable. | File uploads, media, large blobs |
| Search (Elasticsearch) | Full-text search, fuzzy matching, ranked results. DB LIKE queries don't scale. | Search box, autocomplete on unstructured text, faceted filtering |
| WebSocket / SSE Server | Holds persistent connections to clients and pushes events in real time. Stateful — unlike app servers, you can't freely route any request to any instance. | Real-time features: chat, live notifications, live scores, collaborative editing |
NFR → Layers to Add
Translate the interviewer's requirements directly into layers. Use this to build your diagram from the answers you've gathered.
| NFR Answer | Layers to Add |
|---|---|
| High scale, read-heavy | CDN, Cache (Redis), DB Read Replicas |
| High scale, write-heavy | Message Queue, Workers |
| Low latency (< 100ms) | CDN, Cache on hot path — remove DB from read path |
| High availability | DB Read Replicas + standby, multi-AZ Load Balancer |
| Strong consistency | No cache on write path. Single DB primary. Synchronous replication. |
| Security / rate limiting | API Gateway in front of App Servers |
| File or media storage | Object Storage (S3) |
| Async / background work | Message Queue + Workers |
| Fault tolerance | Message Queue (queue buffers if downstream is down), circuit breaker at App Server |
| Full-text search | Search layer (Elasticsearch / OpenSearch) alongside DB — DB handles writes, search index is updated async |
| Real-time push / live updates | WebSocket Server + Message Broker (Redis Pub/Sub or Kafka) for fan-out |
Latency Reference — Cost Per Layer
Single source of truth for per-layer latency costs. Use the Typical column for estimates — Best Case shows the theoretical floor with everything going right.
Network is included: Each component figure (Redis 1ms, DB 5ms, etc.) covers the full round-trip including the internal network hop — App→component→App. The only network cost listed separately is the user-facing leg (Browser↔LB) because that varies by geography and is outside your control.
Scenario breakdowns below use the Typical column. Update this table to recalibrate all scenarios.
┌─(1ms)──► Redis
Browser ─(20ms)─► CDN ─(1ms)─► LB ─(5ms)─► App ─┼─(5ms)──► DB
├─(3ms)──► Kafka
└─(50ms)─► S3
| Layer | Best Case | Typical | Notes |
|---|---|---|---|
| Network: user → CDN edge | 5ms | 20ms | Depends on user geography |
| Network: user → origin (no CDN) | 20ms | 60ms | Cross-region can be 150ms+ |
| CDN cache hit | < 1ms | 1ms | After network cost above |
| Load Balancer | < 1ms | 1ms | Pure routing overhead |
| API Gateway (auth + routing) | 3ms | 10ms | Token validation adds most of the cost |
| App Server (simple logic) | 1ms | 5ms | Complex logic or external calls add more |
| Cache / Redis hit | 0.5ms | 1ms | In-memory, same datacenter |
| DB read (indexed query) | 1ms | 5ms | Unindexed or joins: 10–100ms |
| DB write + sync replication | 5ms | 15ms | Waiting for standby to confirm |
| Message Queue publish (Kafka) | 1ms | 3ms | Async — user does NOT wait for consumer |
| Object Storage / S3 read | 10ms | 50ms | First byte. Much slower than DB or cache. |
Scenario 1: Read-Heavy
Design Twitter feed, Reddit homepage, YouTube video page.
NFRs that triggered this: high scale, read-heavy, low latency
[Client]
↓
[CDN] ──────────────────→ (cache hit: return static assets / cached response)
↓ (miss)
[Load Balancer]
↓
[App Servers]
↓
[Cache (Redis)] ─────────→ (cache hit: return feed)
↓ (miss)
[DB Primary] + [DB Read Replicas]
Latency breakdown:
| Path | Hops | Total |
|---|---|---|
| CDN hit | Network (20) + CDN (1) | ~21ms |
| Redis hit | Network (60) + LB (1) + App (5) + Redis (1) | ~67ms |
| DB read (cache miss) | Network (60) + LB (1) + App (5) + Redis miss (1) + DB (5) | ~72ms |
Key decisions:
- Feed data is precomputed and stored in Redis. App servers read from cache, not DB.
- Writes go to DB primary only. Read replicas absorb the bulk of DB read traffic.
- CDN handles profile images, thumbnails, static JS/CSS.
Scenario 2: Write-Heavy
Design an analytics pipeline, logging system, IoT sensor ingestion.
NFRs that triggered this: high scale, write-heavy, fault tolerance
[Client / Producers]
↓
[Load Balancer]
↓
[App Servers] ← accept writes, validate, publish to queue
↓
[Message Queue (Kafka / SQS)] ← absorbs bursts, durable buffer
↓
[Workers / Consumers] ← process at own pace, retry on failure
↓
[Database / Data Warehouse]
Latency breakdown:
| Path | Hops | Total |
|---|---|---|
| User-visible (async write) | Network (60) + LB (1) + App (5) + Queue (3) | ~69ms |
| Worker processing | Happens after 200 returned — not on user's clock | — |
Key decisions:
- App servers never write directly to the DB — they publish to the queue and return 200 immediately.
- Queue decouples ingestion speed from processing speed. If workers are slow, queue grows but nothing drops.
- Workers can be scaled independently. Failed jobs stay in queue for retry.
Scenario 3: Strong Consistency Required
Design a payment system, hotel booking, airline seat reservation.
NFRs that triggered this: strong consistency, durability, security
[Client]
↓
[Load Balancer]
↓
[API Gateway] ← auth, rate limiting, fraud checks
↓
[App Servers]
↓
[DB Primary] ← all reads AND writes go here (no cache on write path)
↓ (synchronous)
[DB Standby] ← hot standby, promoted on failure
↓
[Audit Log] ← append-only record of every transaction
Latency breakdown:
| Path | Hops | Total |
|---|---|---|
| Write (sync replication) | Network (60) + LB (1) + API GW (10) + App (5) + DB write + sync replication (15) | ~91ms |
App servers do more work here (auth checks, business rule validation) but the dominant cost is sync replication — that's the price of RPO = 0.
Key decisions:
- No cache on the write path — stale reads could cause double-booking or double-charging.
- DB standby is synchronous (primary waits for standby to confirm before acking write).
- API Gateway handles rate limiting to prevent abuse at the payment endpoint.
- Audit log is append-only and separate — required for compliance and debugging. Written asynchronously after the DB confirms, so it is not on the user's critical path and does not add to the ~91ms above.
Scenario 4: Low Latency
Design autocomplete/typeahead, live leaderboard, stock price feed.
NFRs that triggered this: latency < 100ms, read-heavy, high scale
[Client]
↓
[CDN / Edge Cache] ──────→ (return if result cached at edge)
↓ (miss)
[Load Balancer]
↓
[App Servers]
↓
[Cache (Redis)] ──────────→ (precomputed results — return immediately)
↓ (cold start / miss only)
[Database]
Latency breakdown:
| Path | Hops | Total |
|---|---|---|
| CDN edge hit | Network (20) + CDN (1) | ~21ms |
| Redis hit | Network (60) + LB (1) + App (5) + Redis (1) | ~67ms |
| DB (cold start only) | Network (60) + LB (1) + App (5) + Redis miss (1) + DB (5) | ~72ms |
At < 100ms target, even a DB miss is within budget — but only if the query is indexed. The goal is 99%+ Redis hit rate so DB is never on the hot path.
Key decisions:
- DB is off the hot path entirely. Cache must have near-100% hit rate for common queries.
- Results are precomputed and pushed into Redis (e.g., top 10 autocomplete results per prefix).
- CDN caches at the edge for global users — reduces round-trip time before even hitting your servers.
- Any update to results is pushed into Redis asynchronously, not on the request path.
Scenario 5: Balanced / General Purpose
Design Uber, Airbnb, a general marketplace.
NFRs that triggered this: moderate scale, availability, latency, some consistency
[Client (Web + Mobile)]
↓
[CDN] ← static assets only
↓
[Load Balancer]
↓
[API Gateway] ← auth, rate limiting
↓
[App Servers]
├──→ [Cache (Redis)] ← hot reads (search results, listings)
├──→ [Message Queue] ← async tasks (email, notifications, billing)
│ ↓
│ [Workers]
↓
[DB Primary] + [DB Read Replicas]
↓
[Object Storage (S3)] ← user photos, listing images
Latency breakdown:
| Path | Hops | Total |
|---|---|---|
| Cached read (listing page) | Network (60) + LB (1) + API GW (10) + App (5) + Redis (1) | ~77ms |
| DB read (cache miss) | Network (60) + LB (1) + API GW (10) + App (5) + DB (5) | ~81ms |
| Write (booking) | Network (60) + LB (1) + API GW (10) + App (5) + DB (5) | ~81ms |
| Async (notification) | Returns after queue publish (3) — worker runs offline | ~79ms |
Key decisions:
- Most features read from cache, write to DB. A few critical paths (booking, payment) skip cache and write directly to primary.
- Message queue handles notifications, confirmation emails, and analytics events — nothing that should block the user's request.
- Object storage for all media. App servers only store the URL reference in the DB.
Scenario 6: Real-Time / Fan-Out
Design a chat system, live notification feed, live scoreboard, collaborative editor.
NFRs that triggered this: real-time push, low latency, high availability
[Client A] [Client B] [Client C]
↕ ↕ ↕ ← persistent WebSocket connections
[Load Balancer] ← sticky routing (session affinity by connection ID)
↓
[WebSocket Servers] ← stateful — each holds thousands of open connections
↑ (all servers subscribe to relevant channels)
[Message Broker (Redis Pub/Sub or Kafka)]
↑
[App Servers] ← receive events, validate, publish to broker
↓
[DB Primary] ← persist messages / events
The fan-out problem: An event (new message, score update) must reach all connected clients, who may be connected to different WebSocket servers.
Client A sends message
→ App Server processes + writes to DB
→ publishes event to broker channel
→ all WebSocket Servers subscribed to that channel receive it
→ each pushes to its connected clients (B, C, etc.)
Latency breakdown:
| Path | Hops | Total |
|---|---|---|
| Sender ack (message accepted) | Network (60) + LB (1) + App (5) + DB (5) + Broker publish (3) | ~74ms |
| Recipient receives push | Broker deliver ( |
~4ms after ack |
Key decisions:
- WebSocket servers are stateful — a client's connection lives on one server. The LB must use sticky routing (session affinity) so reconnects land on the same server, or all servers must subscribe to all channels.
- Redis Pub/Sub for fan-out when durability isn't needed (notifications, live scores). Kafka when you need replay or durability (chat history, audit trail).
- Each WebSocket server can hold ~10K–100K concurrent connections. At 1M concurrent users you need 10–100 servers — plan capacity around connection count, not QPS.
- Heartbeat / ping-pong on each connection to detect stale clients and free resources.
- DB stores message history. WebSocket layer is for delivery only — clients fetch history via a normal REST API on connect.