Introduction
Building scalable, reliable, maintainable backend systems comes down to a few repeatable ideas: how services communicate, how state changes are represented, how failures are handled, how data scales, and how you observe what’s happening in production.
This guide is structured as interview-style questions with practical explanations. Use it for quick revision, as a checklist when designing systems, or as a set of topics to deepen over time.
Table of Contents
Open Table of Contents
Architecture Patterns
Event-Driven Architecture (EDA)
What is Event-Driven Architecture?
Event-Driven Architecture is a style where services communicate by publishing events (facts about something that happened) and subscribing to the events they care about. Instead of direct service-to-service calls, you push events to a broker or bus and let consumers react.
Key characteristics
- Loose coupling: producers don’t know who consumes events
- Async by default: improves responsiveness and throughput
- Independent scaling: scale producers/consumers separately
- Failure isolation: a consumer can be down without stopping producers
Common use cases
- microservices workflows (orders, payments, inventory)
- activity streams / analytics
- notification pipelines
- IoT and telemetry
Example flow
User Action → Event Producer → Event Bus/Broker → Event Consumers
Things to watch
- delivery guarantees (at-most-once, at-least-once, exactly-once)
- ordering (global vs per-key ordering)
- duplicate handling (often solved with idempotency)
Saga Pattern
What is the Saga Pattern?
The Saga Pattern manages multi-step business transactions across services without relying on a single ACID transaction. A saga is a sequence of local transactions. If something fails, you run compensating actions to undo earlier steps.
Two styles
- Choreography: services react to events and decide the next step
- Orchestration: a coordinator explicitly tells each service what to do
Example: order processing
1. Create Order (Order Service)
2. Reserve Inventory (Inventory Service) → Compensate: Release Inventory
3. Charge Payment (Payment Service) → Compensate: Refund Payment
4. Arrange Shipment (Shipping Service) → Compensate: Cancel Shipment
⚠️ If step 3 fails → release inventory + cancel order.
When to use
- cross-service workflows
- long-running processes
- you can accept eventual consistency
Trade-offs
- avoids distributed locks and 2PC blocking
- scales better than global transactions
- more complex failure handling and state management
- eventual consistency is the default
CQRS (Command Query Responsibility Segregation)
What is CQRS?
CQRS splits your system into:
- Commands: write operations (create/update/delete)
- Queries: read operations
The read and write sides can have different data models and even different storage, so each can be optimized independently.
Traditional vs CQRS
| Traditional | CQRS |
|---|---|
| One model handles reads + writes | Command model (write-optimized) → events/replication → Query model (read-optimized) |
| Same storage for reads and writes | Different storage optimized per side |
| Single source of truth | Separate models, eventual consistency |
Why it’s useful
- scale reads and writes independently
- simplify write rules while keeping reads fast
- create multiple read views (e.g., dashboards vs detail pages)
Example: e-commerce product catalog
Write side: Normalized relational DB for product updates (price, inventory, description). Write operations are simple and consistent.
Read side: Denormalized document store optimized for queries. Product detail pages might store:
{
"id": 123,
"name": "Product Name",
"price": 99.99,
"images": [...],
"reviews": [...],
"relatedProducts": [...]
}
All in one document for fast reads. Search indexes, category listings, and recommendation engines each have their own optimized read models.
When a product price changes, the write model updates the normalized table, then an event or CDC mechanism propagates the change to all read models.
Costs
- more moving parts
- read model may be slightly stale (eventual consistency)
- you need a reliable sync mechanism (events, CDC, replication)
Event Sourcing
What is Event Sourcing?
Event Sourcing stores a log of events as the source of truth, rather than storing only the latest state. Current state is derived by replaying events (often with snapshots for speed).
Traditional vs Event Sourcing
| Traditional | Event Sourcing |
|---|---|
current state → update → new state | Event 1 → Event 2 → Event 3 → ... → current state (derived) |
| Stores current state only | Stores all events as source of truth |
| State is the source of truth | Events are the source of truth |
| Limited audit trail | Complete audit trail by default |
Example: bank account
Events:
- AccountCreated
- MoneyDeposited(50)
- MoneyWithdrawn(20)
- MoneyDeposited(70)
Current Balance (derived): 50 - 20 + 70 = 100
Benefits
- audit trail by default
- easier debugging (“what happened?”)
- you can rebuild new read models from history
Challenges
- event versioning (schemas change)
- event store growth (snapshots + retention strategy)
- building projections/read models adds complexity
Circuit Breaker Pattern
What is the Circuit Breaker Pattern?
Circuit breakers prevent cascading failures by stopping calls to a dependency that is failing. After too many failures, the breaker “opens” and requests fail fast (or return fallbacks) until recovery.
States
- Closed: normal traffic
- Open: fail fast (dependency is unhealthy)
- Half-open: allow limited test traffic to see if it recovered
State transitions
Closed → (failure threshold) → Open
Open → (cooldown timeout) → Half-open
Half-open → (enough success) → Closed
Half-open → (failure) → Open
| State | Behavior |
|---|---|
| Closed | Normal operation, requests pass through |
| Open | Fail fast, no requests to unhealthy dependency |
| Half-open | Test mode, limited requests to check recovery |
Example: payment service dependency
Your order service calls a payment service. After 5 failures in 10 seconds, the circuit opens. All new requests immediately return a fallback response (e.g., “Payment temporarily unavailable, please try again”) without waiting for the payment service timeout. After 30 seconds, the circuit goes half-open and allows one test request. If it succeeds, the circuit closes and normal traffic resumes. If it fails, the circuit opens again.
Where it helps
- external APIs
- internal service-to-service calls
- database or cache dependencies
Pairs well with
- retries with backoff
- timeouts
- bulkheads (limit concurrency)
Distributed Systems
Distributed Tracing
What is Distributed Tracing?
Distributed tracing tracks a single request as it flows through multiple services using a shared trace ID. Each step is a span with timing, metadata, and relationships.
Key terms
- Trace: the end-to-end request
- Span: one operation within the trace
- Trace ID: unique request ID
- Span ID: unique operation ID
Why it matters
- find bottlenecks (where time is spent)
- debug failures across service boundaries
- understand dependencies and fan-out
Example
Trace abc123
├─ GET /api/orders (200ms)
│ ├─ DB query (50ms)
│ ├─ Payment service call (100ms)
│ └─ Inventory service call (40ms)
This shows the full request path, making it easy to identify that the payment service call takes the longest (100ms).
CAP Theorem
What is the CAP Theorem?
CAP says a distributed system can’t simultaneously guarantee all three:
- Consistency (C): all nodes return the same latest value
- Availability (A): every request gets a response
- Partition tolerance (P): the system continues despite network splits
In real distributed systems, partitions happen, so you typically choose between:
- CP: consistent but may reject/timeout requests during a partition
- AP: stays available but may return stale data temporarily
Practical examples
- payments/ledger → usually lean CP
- feeds/analytics → often accept AP
Important nuance
CAP is about behavior during a partition, not forever. Many systems offer knobs (timeouts, quorum reads/writes) that move you along the spectrum.
Idempotency
What is Idempotency?
An operation is idempotent if repeating it produces the same effect as doing it once. This matters because retries happen (timeouts, network glitches, client re-sends).
HTTP intuition
- GET: idempotent
- PUT: idempotent (replace)
- DELETE: idempotent (delete again does nothing)
- POST: usually not idempotent (creates new resource each time)
Common strategy: idempotency keys
Client sends a unique key; server stores the result for that key and returns the same response for duplicates.
Example: payment processing
POST /api/payments
Headers:
Idempotency-Key: abc-123-xyz
Body:
{
"amount": 100,
"card": "..."
}
First request: Processes payment, stores result with key abc-123-xyz
Retry (same key): Returns stored result, doesn’t charge again
The server checks: “Have I seen key
abc-123-xyz?” If yes, return the previous response. If no, process and store the result.
Best practices
- make write endpoints idempotent when retries are expected
- store idempotency records with TTL
- ensure “same key” returns the same response body/status
Data Management
Data Sharding
What is Data Sharding?
Sharding splits a dataset across multiple databases/servers (horizontal partitioning). Each shard holds a subset of the data.
Sharding vs replication
| Sharding | Replication |
|---|---|
| Different data on different nodes | Same data on multiple nodes |
| Scales capacity and write throughput | Scales reads and availability |
| Each node holds a subset | Each node holds a copy |
Common strategies
- Range-based: IDs 1–1M on shard A
- Hash-based:
hash(user_id) % N - Directory-based: a lookup maps keys to shards
- Geo-based: shard by region
Example: user data sharding
You have 100M users. With hash-based sharding across 4 shards:
hash(user_id) % 4 = 0 → Shard 1
hash(user_id) % 4 = 1 → Shard 2
hash(user_id) % 4 = 2 → Shard 3
hash(user_id) % 4 = 3 → Shard 4
User ID 12345 → hash(12345) % 4 = 2 → stored on Shard 3. Each shard holds ~25M users. Queries for a specific user are fast (only one shard). Queries like “all users in California” require querying all shards and merging results.
Trade-offs
- scales storage and write throughput
- cross-shard queries and transactions are harder
- rebalancing is operationally complex
Database Indexing
What is Database Indexing?
Indexes are extra data structures that help databases find rows faster—like a book index. Without an index, the DB may scan many rows. With the right index, it can jump close to the answer.
What indexes improve
WHEREfilters- joins
ORDER BY- some
GROUP BYworkloads
Example: user lookup
Without index:
SELECT * FROM users WHERE email = 'user@example.com';
-- Full table scan: checks all 10M rows, takes 2 seconds
With index on email:
CREATE INDEX idx_email ON users(email);
SELECT * FROM users WHERE email = 'user@example.com';
-- Index lookup: finds row directly, takes 5ms
The index is like a sorted list: email → row location. The database can binary search instead of scanning every row.
Trade-offs
- indexes consume storage
- writes get slower (indexes must be updated)
Practical guidance
- index columns you filter/join on frequently
- avoid over-indexing
- validate with query plans (e.g.,
EXPLAIN)
Database Replication
What is Database Replication?
Replication copies data from a primary database to one or more replicas.
Common setup
- primary: handles writes
- replicas: handle reads
Sync vs async
| Synchronous | Asynchronous |
|---|---|
| Stronger consistency | Eventual consistency |
| Slower writes (wait for replica confirmation) | Faster writes (don’t wait) |
| Replicas stay in sync | Replicas may lag |
| Higher latency | Lower latency |
Why it’s used
- read scaling
- high availability / failover
- disaster recovery
- running analytics on replicas
Common pitfall: replication lag
A user writes to primary, then immediately reads from a replica and sees stale data. Solutions include reading from primary for critical paths, or using stronger replication settings where needed.
Infrastructure & Operations
API Gateway
What is an API Gateway?
An API gateway is a single entry point for clients. It routes requests to internal services and handles cross-cutting concerns.
Typical responsibilities
- auth / JWT validation
- routing
- rate limiting
- request/response transformation
- logging/metrics
- sometimes caching
Example: request flow
Client sends: GET /api/v1/orders/123
1. API Gateway receives request
2. Validates JWT token (auth)
3. Checks rate limits (user has 50 requests left this hour)
4. Routes to Order Service (based on path /api/v1/orders/*)
5. Order Service returns order data
6. Gateway transforms response (adds metadata, formats dates)
7. Gateway logs request and metrics
8. Returns response to client
The client never knows about Order Service directly. If you later split Order Service into multiple services, you only change routing in the gateway.
Trade-offs
- adds an extra hop (latency)
- can become a bottleneck if not scaled
- needs strong observability and redundancy
Caching
What is Caching?
Caching stores frequently used data in fast storage (often memory) to reduce latency and lower database/service load.
Common pattern: cache-aside
1. Check cache
2. On miss, read DB
3. Store result in cache
Example: product details
User requests product ID 123:
- Check Redis:
GET product:123→ miss - Query database:
SELECT * FROM products WHERE id = 123 - Store in Redis:
SET product:123 <data> EX 3600(expires in 1 hour) - Return data
Next request for product 123: Redis hit, return immediately (no DB query).
When product price changes:
- Option 1: Invalidate cache
DEL product:123(next request fetches fresh data) - Option 2: Update cache
SET product:123 <new-data> - Option 3: Let TTL expire (simpler, but stale data for up to 1 hour)
Why cache invalidation is hard
Because the moment the source data changes, you must decide:
- Which cached keys are now stale? (
product:123,product-list,search-results,category-page?)- When to expire them? (immediately or wait for TTL?)
- How to keep multiple caches consistent? (Redis, CDN, browser cache all need updates)
Practical approaches
- TTLs for most data
- event-based invalidation for critical freshness
- versioned keys for complex dependency graphs
Rate Limiting
What is Rate Limiting?
Rate limiting caps how many requests a client can make in a window to protect your system and ensure fair usage.
Common algorithms
- fixed window (simple)
- sliding window (more accurate)
- token bucket (allows bursts)
- leaky bucket (smooths bursts)
Example: API rate limiting
Free tier: 100 requests per hour per API key.
Fixed window approach:
- Window 1 (2:00-2:59 PM): 100 requests allowed
- Window 2 (3:00-3:59 PM): 100 requests allowed
- At 2:59 PM, user makes 100 requests
- At 3:00 PM, user can immediately make 100 more (⚠️ burst at boundary)
Token bucket approach:
- Bucket capacity: 100 tokens
- Refill rate: 100 tokens per hour (1 token every 36 seconds)
- User makes 100 requests at 2:59 PM (bucket empty)
- At 3:00 PM, only 1 token available (✅ no burst)
When limit exceeded:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1609459200
Retry-After: 3600
Typical response
- HTTP
429 Too Many Requests - include
Retry-Afterand remaining/quota headers
Load Balancing
What is Load Balancing?
Load balancing distributes incoming traffic across multiple instances so no single server is overloaded and failures don’t take the system down.
Layer 4 vs layer 7
| Layer 4 (Transport) | Layer 7 (Application) |
|---|---|
| Routes by IP/port | Routes by HTTP path/headers |
| Faster (less processing) | Smarter (more context-aware) |
| Lower latency | Can do path-based routing, SSL termination |
| TCP/UDP level | HTTP/HTTPS level |
Common policies
- round robin
- least connections
- weighted routing
- consistent hashing (useful for cache locality)
Example: request distribution
You have 3 servers behind a load balancer.
Round robin:
Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A (cycle repeats)
With least connections (better for long-lived connections):
- Server A: 10 active connections
- Server B: 5 active connections
- Server C: 15 active connections
- Next request → Server B (fewest connections)
With consistent hashing (useful for caching):
- User ID 12345 → hash(12345) % 3 = 1 → always routes to Server B
- Same user always hits same server, so their cached data is there
Session affinity problem
User logs in on Server A, session stored there. Next request goes to Server B, session missing.
Solutions:
- Sticky sessions (route by IP/cookie)
- Shared session store (Redis)
- Stateless sessions (JWT)
Conclusion
These concepts show up everywhere in real backend work. EDA and Sagas handle workflows. CQRS and Event Sourcing enable scalable state changes and auditability. Circuit breakers and idempotency improve reliability. Tracing helps debug across services. CAP helps understand trade-offs. Sharding, replication, and indexing scale data. Gateways with caching, rate limiting, and load balancing make systems production-ready.
Treat this as a checklist: learn the definition, understand the trade-offs, then build one small demo per topic.