Backend System Design Fundamentals — Questions

Introduction

Building scalable, reliable, maintainable backend systems comes down to a few repeatable ideas: how services communicate, how state changes are represented, how failures are handled, how data scales, and how you observe what’s happening in production.

This guide is structured as interview-style questions with practical explanations. Use it for quick revision, as a checklist when designing systems, or as a set of topics to deepen over time.

Open Table of Contents

Architecture Patterns
Distributed Systems
Data Management
Infrastructure & Operations
Conclusion

Architecture Patterns

Event-Driven Architecture (EDA)

What is Event-Driven Architecture?

Event-Driven Architecture is a style where services communicate by publishing events (facts about something that happened) and subscribing to the events they care about. Instead of direct service-to-service calls, you push events to a broker or bus and let consumers react.

Key characteristics

Loose coupling: producers don’t know who consumes events
Async by default: improves responsiveness and throughput
Independent scaling: scale producers/consumers separately
Failure isolation: a consumer can be down without stopping producers

Common use cases

microservices workflows (orders, payments, inventory)
activity streams / analytics
notification pipelines
IoT and telemetry

Example flow

User Action → Event Producer → Event Bus/Broker → Event Consumers

Things to watch

delivery guarantees (at-most-once, at-least-once, exactly-once)
ordering (global vs per-key ordering)
duplicate handling (often solved with idempotency)

Saga Pattern

What is the Saga Pattern?

The Saga Pattern manages multi-step business transactions across services without relying on a single ACID transaction. A saga is a sequence of local transactions. If something fails, you run compensating actions to undo earlier steps.

Two styles

Choreography: services react to events and decide the next step
Orchestration: a coordinator explicitly tells each service what to do

Example: order processing

1. Create Order (Order Service)
2. Reserve Inventory (Inventory Service) → Compensate: Release Inventory
3. Charge Payment (Payment Service) → Compensate: Refund Payment
4. Arrange Shipment (Shipping Service) → Compensate: Cancel Shipment

⚠️ If step 3 fails → release inventory + cancel order.

When to use

cross-service workflows
long-running processes
you can accept eventual consistency

Trade-offs

avoids distributed locks and 2PC blocking
scales better than global transactions
more complex failure handling and state management
eventual consistency is the default

CQRS (Command Query Responsibility Segregation)

What is CQRS?

CQRS splits your system into:

Commands: write operations (create/update/delete)
Queries: read operations

The read and write sides can have different data models and even different storage, so each can be optimized independently.

Traditional vs CQRS

Traditional	CQRS
One model handles reads + writes	Command model (write-optimized) → events/replication → Query model (read-optimized)
Same storage for reads and writes	Different storage optimized per side
Single source of truth	Separate models, eventual consistency

Why it’s useful

scale reads and writes independently
simplify write rules while keeping reads fast
create multiple read views (e.g., dashboards vs detail pages)

Example: e-commerce product catalog

Write side: Normalized relational DB for product updates (price, inventory, description). Write operations are simple and consistent.

Read side: Denormalized document store optimized for queries. Product detail pages might store:

{
  "id": 123,
  "name": "Product Name",
  "price": 99.99,
  "images": [...],
  "reviews": [...],
  "relatedProducts": [...]
}

All in one document for fast reads. Search indexes, category listings, and recommendation engines each have their own optimized read models.

When a product price changes, the write model updates the normalized table, then an event or CDC mechanism propagates the change to all read models.

Costs

more moving parts
read model may be slightly stale (eventual consistency)
you need a reliable sync mechanism (events, CDC, replication)

Event Sourcing

What is Event Sourcing?

Event Sourcing stores a log of events as the source of truth, rather than storing only the latest state. Current state is derived by replaying events (often with snapshots for speed).

Traditional vs Event Sourcing

Traditional	Event Sourcing
`current state → update → new state`	`Event 1 → Event 2 → Event 3 → ... → current state (derived)`
Stores current state only	Stores all events as source of truth
State is the source of truth	Events are the source of truth
Limited audit trail	Complete audit trail by default

Example: bank account

Events:
- AccountCreated
- MoneyDeposited(50)
- MoneyWithdrawn(20)
- MoneyDeposited(70)

Current Balance (derived): 50 - 20 + 70 = 100

Benefits

audit trail by default
easier debugging (“what happened?”)
you can rebuild new read models from history

Challenges

event versioning (schemas change)
event store growth (snapshots + retention strategy)
building projections/read models adds complexity

Circuit Breaker Pattern

What is the Circuit Breaker Pattern?

Circuit breakers prevent cascading failures by stopping calls to a dependency that is failing. After too many failures, the breaker “opens” and requests fail fast (or return fallbacks) until recovery.

States

Closed: normal traffic
Open: fail fast (dependency is unhealthy)
Half-open: allow limited test traffic to see if it recovered

State transitions

Closed → (failure threshold) → Open
Open → (cooldown timeout) → Half-open
Half-open → (enough success) → Closed
Half-open → (failure) → Open

State	Behavior
Closed	Normal operation, requests pass through
Open	Fail fast, no requests to unhealthy dependency
Half-open	Test mode, limited requests to check recovery

Example: payment service dependency

Your order service calls a payment service. After 5 failures in 10 seconds, the circuit opens. All new requests immediately return a fallback response (e.g., “Payment temporarily unavailable, please try again”) without waiting for the payment service timeout. After 30 seconds, the circuit goes half-open and allows one test request. If it succeeds, the circuit closes and normal traffic resumes. If it fails, the circuit opens again.

Where it helps

external APIs
internal service-to-service calls
database or cache dependencies

Pairs well with

retries with backoff
timeouts
bulkheads (limit concurrency)

Distributed Systems

Distributed Tracing

What is Distributed Tracing?

Distributed tracing tracks a single request as it flows through multiple services using a shared trace ID. Each step is a span with timing, metadata, and relationships.

Key terms

Trace: the end-to-end request
Span: one operation within the trace
Trace ID: unique request ID
Span ID: unique operation ID

Why it matters

find bottlenecks (where time is spent)
debug failures across service boundaries
understand dependencies and fan-out

Example

Trace abc123
├─ GET /api/orders (200ms)
│  ├─ DB query (50ms)
│  ├─ Payment service call (100ms)
│  └─ Inventory service call (40ms)

This shows the full request path, making it easy to identify that the payment service call takes the longest (100ms).

CAP Theorem

What is the CAP Theorem?

CAP says a distributed system can’t simultaneously guarantee all three:

Consistency (C): all nodes return the same latest value
Availability (A): every request gets a response
Partition tolerance (P): the system continues despite network splits

In real distributed systems, partitions happen, so you typically choose between:

CP: consistent but may reject/timeout requests during a partition
AP: stays available but may return stale data temporarily

Practical examples

payments/ledger → usually lean CP
feeds/analytics → often accept AP

Important nuance

CAP is about behavior during a partition, not forever. Many systems offer knobs (timeouts, quorum reads/writes) that move you along the spectrum.

Idempotency

What is Idempotency?

An operation is idempotent if repeating it produces the same effect as doing it once. This matters because retries happen (timeouts, network glitches, client re-sends).

HTTP intuition

GET: idempotent
PUT: idempotent (replace)
DELETE: idempotent (delete again does nothing)
POST: usually not idempotent (creates new resource each time)

Common strategy: idempotency keys

Client sends a unique key; server stores the result for that key and returns the same response for duplicates.

Example: payment processing

POST /api/payments
Headers:
  Idempotency-Key: abc-123-xyz
Body:
  {
    "amount": 100,
    "card": "..."
  }

First request: Processes payment, stores result with key abc-123-xyz

Retry (same key): Returns stored result, doesn’t charge again

The server checks: “Have I seen key abc-123-xyz?” If yes, return the previous response. If no, process and store the result.

Best practices

make write endpoints idempotent when retries are expected
store idempotency records with TTL
ensure “same key” returns the same response body/status

Data Management

Data Sharding

What is Data Sharding?

Sharding splits a dataset across multiple databases/servers (horizontal partitioning). Each shard holds a subset of the data.

Sharding vs replication

Sharding	Replication
Different data on different nodes	Same data on multiple nodes
Scales capacity and write throughput	Scales reads and availability
Each node holds a subset	Each node holds a copy

Common strategies

Range-based: IDs 1–1M on shard A
Hash-based: hash(user_id) % N
Directory-based: a lookup maps keys to shards
Geo-based: shard by region

Example: user data sharding

You have 100M users. With hash-based sharding across 4 shards:

hash(user_id) % 4 = 0 → Shard 1
hash(user_id) % 4 = 1 → Shard 2
hash(user_id) % 4 = 2 → Shard 3
hash(user_id) % 4 = 3 → Shard 4

User ID 12345 → hash(12345) % 4 = 2 → stored on Shard 3. Each shard holds ~25M users. Queries for a specific user are fast (only one shard). Queries like “all users in California” require querying all shards and merging results.

Trade-offs

scales storage and write throughput
cross-shard queries and transactions are harder
rebalancing is operationally complex

Database Indexing

What is Database Indexing?

Indexes are extra data structures that help databases find rows faster—like a book index. Without an index, the DB may scan many rows. With the right index, it can jump close to the answer.

What indexes improve

WHERE filters
joins
ORDER BY
some GROUP BY workloads

Example: user lookup

Without index:

SELECT * FROM users WHERE email = 'user@example.com';
-- Full table scan: checks all 10M rows, takes 2 seconds

With index on email:

CREATE INDEX idx_email ON users(email);
SELECT * FROM users WHERE email = 'user@example.com';
-- Index lookup: finds row directly, takes 5ms

The index is like a sorted list: email → row location. The database can binary search instead of scanning every row.

Trade-offs

indexes consume storage
writes get slower (indexes must be updated)

Practical guidance

index columns you filter/join on frequently
avoid over-indexing
validate with query plans (e.g., EXPLAIN)

Database Replication

What is Database Replication?

Replication copies data from a primary database to one or more replicas.

Common setup

primary: handles writes
replicas: handle reads

Sync vs async

Synchronous	Asynchronous
Stronger consistency	Eventual consistency
Slower writes (wait for replica confirmation)	Faster writes (don’t wait)
Replicas stay in sync	Replicas may lag
Higher latency	Lower latency

Why it’s used

read scaling
high availability / failover
disaster recovery
running analytics on replicas

Common pitfall: replication lag

A user writes to primary, then immediately reads from a replica and sees stale data. Solutions include reading from primary for critical paths, or using stronger replication settings where needed.

Infrastructure & Operations

API Gateway

What is an API Gateway?

An API gateway is a single entry point for clients. It routes requests to internal services and handles cross-cutting concerns.

Typical responsibilities

auth / JWT validation
routing
rate limiting
request/response transformation
logging/metrics
sometimes caching

Example: request flow

Client sends: GET /api/v1/orders/123

1. API Gateway receives request
2. Validates JWT token (auth)
3. Checks rate limits (user has 50 requests left this hour)
4. Routes to Order Service (based on path /api/v1/orders/*)
5. Order Service returns order data
6. Gateway transforms response (adds metadata, formats dates)
7. Gateway logs request and metrics
8. Returns response to client

The client never knows about Order Service directly. If you later split Order Service into multiple services, you only change routing in the gateway.

Trade-offs

adds an extra hop (latency)
can become a bottleneck if not scaled
needs strong observability and redundancy

Caching

What is Caching?

Caching stores frequently used data in fast storage (often memory) to reduce latency and lower database/service load.

Common pattern: cache-aside

1. Check cache
2. On miss, read DB
3. Store result in cache

Example: product details

User requests product ID 123:

Check Redis: GET product:123 → miss
Query database: SELECT * FROM products WHERE id = 123
Store in Redis: SET product:123 <data> EX 3600 (expires in 1 hour)
Return data

Next request for product 123: Redis hit, return immediately (no DB query).

When product price changes:

Option 1: Invalidate cache DEL product:123 (next request fetches fresh data)
Option 2: Update cache SET product:123 <new-data>
Option 3: Let TTL expire (simpler, but stale data for up to 1 hour)

Why cache invalidation is hard

Because the moment the source data changes, you must decide:

Which cached keys are now stale? (product:123, product-list, search-results, category-page?)

When to expire them? (immediately or wait for TTL?)

How to keep multiple caches consistent? (Redis, CDN, browser cache all need updates)

Practical approaches

TTLs for most data
event-based invalidation for critical freshness
versioned keys for complex dependency graphs

Rate Limiting

What is Rate Limiting?

Rate limiting caps how many requests a client can make in a window to protect your system and ensure fair usage.

Common algorithms

fixed window (simple)
sliding window (more accurate)
token bucket (allows bursts)
leaky bucket (smooths bursts)

Example: API rate limiting

Free tier: 100 requests per hour per API key.

Fixed window approach:

Window 1 (2:00-2:59 PM): 100 requests allowed
Window 2 (3:00-3:59 PM): 100 requests allowed
At 2:59 PM, user makes 100 requests
At 3:00 PM, user can immediately make 100 more (⚠️ burst at boundary)

Token bucket approach:

Bucket capacity: 100 tokens
Refill rate: 100 tokens per hour (1 token every 36 seconds)
User makes 100 requests at 2:59 PM (bucket empty)
At 3:00 PM, only 1 token available (✅ no burst)

When limit exceeded:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1609459200
Retry-After: 3600

Typical response

HTTP 429 Too Many Requests
include Retry-After and remaining/quota headers

Load Balancing

What is Load Balancing?

Load balancing distributes incoming traffic across multiple instances so no single server is overloaded and failures don’t take the system down.

Layer 4 vs layer 7

Layer 4 (Transport)	Layer 7 (Application)
Routes by IP/port	Routes by HTTP path/headers
Faster (less processing)	Smarter (more context-aware)
Lower latency	Can do path-based routing, SSL termination
TCP/UDP level	HTTP/HTTPS level

Common policies

round robin
least connections
weighted routing
consistent hashing (useful for cache locality)

Example: request distribution

You have 3 servers behind a load balancer.

Round robin:

Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A (cycle repeats)

With least connections (better for long-lived connections):

Server A: 10 active connections
Server B: 5 active connections
Server C: 15 active connections
Next request → Server B (fewest connections)

With consistent hashing (useful for caching):

User ID 12345 → hash(12345) % 3 = 1 → always routes to Server B
Same user always hits same server, so their cached data is there

Session affinity problem

User logs in on Server A, session stored there. Next request goes to Server B, session missing.

Solutions:

Sticky sessions (route by IP/cookie)
Shared session store (Redis)
Stateless sessions (JWT)

Conclusion

These concepts show up everywhere in real backend work. EDA and Sagas handle workflows. CQRS and Event Sourcing enable scalable state changes and auditability. Circuit breakers and idempotency improve reliability. Tracing helps debug across services. CAP helps understand trade-offs. Sharding, replication, and indexing scale data. Gateways with caching, rate limiting, and load balancing make systems production-ready.

Treat this as a checklist: learn the definition, understand the trade-offs, then build one small demo per topic.