System Design Layers

NFR answers tell you what the system needs. Layers are what you draw. Start with the core 4, then add layers based on what the interviewer tells you.

The Core 4 — Always Draw These

Every system design diagram starts here. Add more only when NFRs demand it.

Layer	What It Does	Example Tech
Client	Browser, mobile app, or external service making requests	Web, iOS, Android
Load Balancer	Distributes traffic across app servers. Single entry point. Handles health checks.	AWS ALB, Nginx
App Server	Business logic. Stateless so you can add more horizontally.	Node, Java, Go
Database	Persistent storage. Default to relational unless there's a clear reason not to.	PostgreSQL, MySQL

[Client]
   ↓
[Load Balancer]
   ↓
[App Server(s)]
   ↓
[Database]

Optional Layers — Added by NFRs

Add only when an NFR demands it. Each one solves a specific problem but adds latency, cost, and operational complexity. Use the trigger column to decide during requirements gathering.

Layer	What It Does	Add When...
CDN	Caches static assets at edge nodes close to users.	Global users, latency matters, lots of static assets
API Gateway	Handles auth, rate limiting, and routing before requests hit app servers.	Security requirements, multiple client types, public API
Cache (Redis)	Stores hot data in memory. Microsecond reads. Reduces DB load.	Read-heavy, latency < 100ms, same data read repeatedly
Message Queue	Decouples producers and consumers. Absorbs write bursts.	Write-heavy, async work, fault tolerance needed
Worker / Consumer	Processes jobs from the queue asynchronously.	Always paired with a message queue
DB Read Replicas	Copies of the primary DB that serve reads. Primary handles writes only.	Read-heavy, high availability
Object Storage	Stores files, images, video. Cheap, durable, infinitely scalable.	File uploads, media, large blobs
Search (Elasticsearch)	Full-text search, fuzzy matching, ranked results. DB LIKE queries don't scale.	Search box, autocomplete on unstructured text, faceted filtering
WebSocket / SSE Server	Holds persistent connections to clients and pushes events in real time. Stateful — unlike app servers, you can't freely route any request to any instance.	Real-time features: chat, live notifications, live scores, collaborative editing

NFR → Layers to Add

Translate the interviewer's requirements directly into layers. Use this to build your diagram from the answers you've gathered.

NFR Answer	Layers to Add
High scale, read-heavy	CDN, Cache (Redis), DB Read Replicas
High scale, write-heavy	Message Queue, Workers
Low latency (< 100ms)	CDN, Cache on hot path — remove DB from read path
High availability	DB Read Replicas + standby, multi-AZ Load Balancer
Strong consistency	No cache on write path. Single DB primary. Synchronous replication.
Security / rate limiting	API Gateway in front of App Servers
File or media storage	Object Storage (S3)
Async / background work	Message Queue + Workers
Fault tolerance	Message Queue (queue buffers if downstream is down), circuit breaker at App Server
Full-text search	Search layer (Elasticsearch / OpenSearch) alongside DB — DB handles writes, search index is updated async
Real-time push / live updates	WebSocket Server + Message Broker (Redis Pub/Sub or Kafka) for fan-out

Latency Reference — Cost Per Layer

Single source of truth for per-layer latency costs. Use the Typical column for estimates — Best Case shows the theoretical floor with everything going right.

Network is included: Each component figure (Redis 1ms, DB 5ms, etc.) covers the full round-trip including the internal network hop — App→component→App. The only network cost listed separately is the user-facing leg (Browser↔LB) because that varies by geography and is outside your control.

Scenario breakdowns below use the Typical column. Update this table to recalibrate all scenarios.

                                              ┌─(1ms)──► Redis
Browser ─(20ms)─► CDN ─(1ms)─► LB ─(5ms)─► App ─┼─(5ms)──► DB
                                              ├─(3ms)──► Kafka
                                              └─(50ms)─► S3

Layer	Best Case	Typical	Notes
Network: user → CDN edge	5ms	20ms	Depends on user geography
Network: user → origin (no CDN)	20ms	60ms	Cross-region can be 150ms+
CDN cache hit	< 1ms	1ms	After network cost above
Load Balancer	< 1ms	1ms	Pure routing overhead
API Gateway (auth + routing)	3ms	10ms	Token validation adds most of the cost
App Server (simple logic)	1ms	5ms	Complex logic or external calls add more
Cache / Redis hit	0.5ms	1ms	In-memory, same datacenter
DB read (indexed query)	1ms	5ms	Unindexed or joins: 10–100ms
DB write + sync replication	5ms	15ms	Waiting for standby to confirm
Message Queue publish (Kafka)	1ms	3ms	Async — user does NOT wait for consumer
Object Storage / S3 read	10ms	50ms	First byte. Much slower than DB or cache.

Scenario 1: Read-Heavy

Design Twitter feed, Reddit homepage, YouTube video page.

NFRs that triggered this: high scale, read-heavy, low latency

[Client]
   ↓
[CDN] ──────────────────→ (cache hit: return static assets / cached response)
   ↓ (miss)
[Load Balancer]
   ↓
[App Servers]
   ↓
[Cache (Redis)] ─────────→ (cache hit: return feed)
   ↓ (miss)
[DB Primary] + [DB Read Replicas]

Latency breakdown:

Path	Hops	Total
CDN hit	Network (20) + CDN (1)	~21ms
Redis hit	Network (60) + LB (1) + App (5) + Redis (1)	~67ms
DB read (cache miss)	Network (60) + LB (1) + App (5) + Redis miss (1) + DB (5)	~72ms

Key decisions:

Feed data is precomputed and stored in Redis. App servers read from cache, not DB.
Writes go to DB primary only. Read replicas absorb the bulk of DB read traffic.
CDN handles profile images, thumbnails, static JS/CSS.

Scenario 2: Write-Heavy

Design an analytics pipeline, logging system, IoT sensor ingestion.

NFRs that triggered this: high scale, write-heavy, fault tolerance

[Client / Producers]
   ↓
[Load Balancer]
   ↓
[App Servers]  ← accept writes, validate, publish to queue
   ↓
[Message Queue (Kafka / SQS)]  ← absorbs bursts, durable buffer
   ↓
[Workers / Consumers]  ← process at own pace, retry on failure
   ↓
[Database / Data Warehouse]

Latency breakdown:

Path	Hops	Total
User-visible (async write)	Network (60) + LB (1) + App (5) + Queue (3)	~69ms
Worker processing	Happens after 200 returned — not on user's clock	—

Key decisions:

App servers never write directly to the DB — they publish to the queue and return 200 immediately.
Queue decouples ingestion speed from processing speed. If workers are slow, queue grows but nothing drops.
Workers can be scaled independently. Failed jobs stay in queue for retry.

Scenario 3: Strong Consistency Required

Design a payment system, hotel booking, airline seat reservation.

NFRs that triggered this: strong consistency, durability, security

[Client]
   ↓
[Load Balancer]
   ↓
[API Gateway]  ← auth, rate limiting, fraud checks
   ↓
[App Servers]
   ↓
[DB Primary]  ← all reads AND writes go here (no cache on write path)
   ↓ (synchronous)
[DB Standby]  ← hot standby, promoted on failure
   ↓
[Audit Log]  ← append-only record of every transaction

Latency breakdown:

Path	Hops	Total
Write (sync replication)	Network (60) + LB (1) + API GW (10) + App (5) + DB write + sync replication (15)	~91ms

App servers do more work here (auth checks, business rule validation) but the dominant cost is sync replication — that's the price of RPO = 0.

Key decisions:

No cache on the write path — stale reads could cause double-booking or double-charging.
DB standby is synchronous (primary waits for standby to confirm before acking write).
API Gateway handles rate limiting to prevent abuse at the payment endpoint.
Audit log is append-only and separate — required for compliance and debugging. Written asynchronously after the DB confirms, so it is not on the user's critical path and does not add to the ~91ms above.

Scenario 4: Low Latency

Design autocomplete/typeahead, live leaderboard, stock price feed.

NFRs that triggered this: latency < 100ms, read-heavy, high scale

[Client]
   ↓
[CDN / Edge Cache] ──────→ (return if result cached at edge)
   ↓ (miss)
[Load Balancer]
   ↓
[App Servers]
   ↓
[Cache (Redis)] ──────────→ (precomputed results — return immediately)
   ↓ (cold start / miss only)
[Database]

Latency breakdown:

Path	Hops	Total
CDN edge hit	Network (20) + CDN (1)	~21ms
Redis hit	Network (60) + LB (1) + App (5) + Redis (1)	~67ms
DB (cold start only)	Network (60) + LB (1) + App (5) + Redis miss (1) + DB (5)	~72ms

At < 100ms target, even a DB miss is within budget — but only if the query is indexed. The goal is 99%+ Redis hit rate so DB is never on the hot path.

Key decisions:

DB is off the hot path entirely. Cache must have near-100% hit rate for common queries.
Results are precomputed and pushed into Redis (e.g., top 10 autocomplete results per prefix).
CDN caches at the edge for global users — reduces round-trip time before even hitting your servers.
Any update to results is pushed into Redis asynchronously, not on the request path.

Scenario 5: Balanced / General Purpose

Design Uber, Airbnb, a general marketplace.

NFRs that triggered this: moderate scale, availability, latency, some consistency

[Client (Web + Mobile)]
   ↓
[CDN]  ← static assets only
   ↓
[Load Balancer]
   ↓
[API Gateway]  ← auth, rate limiting
   ↓
[App Servers]
   ├──→ [Cache (Redis)]   ← hot reads (search results, listings)
   ├──→ [Message Queue]   ← async tasks (email, notifications, billing)
   │         ↓
   │     [Workers]
   ↓
[DB Primary] + [DB Read Replicas]
   ↓
[Object Storage (S3)]  ← user photos, listing images

Latency breakdown:

Path	Hops	Total
Cached read (listing page)	Network (60) + LB (1) + API GW (10) + App (5) + Redis (1)	~77ms
DB read (cache miss)	Network (60) + LB (1) + API GW (10) + App (5) + DB (5)	~81ms
Write (booking)	Network (60) + LB (1) + API GW (10) + App (5) + DB (5)	~81ms
Async (notification)	Returns after queue publish (3) — worker runs offline	~79ms

Key decisions:

Most features read from cache, write to DB. A few critical paths (booking, payment) skip cache and write directly to primary.
Message queue handles notifications, confirmation emails, and analytics events — nothing that should block the user's request.
Object storage for all media. App servers only store the URL reference in the DB.

Scenario 6: Real-Time / Fan-Out

Design a chat system, live notification feed, live scoreboard, collaborative editor.

NFRs that triggered this: real-time push, low latency, high availability

[Client A]  [Client B]  [Client C]
    ↕            ↕           ↕       ← persistent WebSocket connections
[Load Balancer]  ← sticky routing (session affinity by connection ID)
    ↓
[WebSocket Servers]  ← stateful — each holds thousands of open connections
    ↑   (all servers subscribe to relevant channels)
[Message Broker (Redis Pub/Sub or Kafka)]
    ↑
[App Servers]  ← receive events, validate, publish to broker
    ↓
[DB Primary]   ← persist messages / events

The fan-out problem: An event (new message, score update) must reach all connected clients, who may be connected to different WebSocket servers.

Client A sends message
   → App Server processes + writes to DB
   → publishes event to broker channel
   → all WebSocket Servers subscribed to that channel receive it
   → each pushes to its connected clients (B, C, etc.)

Latency breakdown:

Path	Hops	Total
Sender ack (message accepted)	Network (60) + LB (1) + App (5) + DB (5) + Broker publish (3)	~74ms
Recipient receives push	Broker deliver (~~3ms) + WS server push (~~1ms) — runs after sender ack	~4ms after ack

Key decisions:

WebSocket servers are stateful — a client's connection lives on one server. The LB must use sticky routing (session affinity) so reconnects land on the same server, or all servers must subscribe to all channels.
Redis Pub/Sub for fan-out when durability isn't needed (notifications, live scores). Kafka when you need replay or durability (chat history, audit trail).
Each WebSocket server can hold ~10K–100K concurrent connections. At 1M concurrent users you need 10–100 servers — plan capacity around connection count, not QPS.
Heartbeat / ping-pong on each connection to detect stale clients and free resources.
DB stores message history. WebSocket layer is for delivery only — clients fetch history via a normal REST API on connect.