2 posts tagged with "Backend" | System Design Learning

Adaptive Traffic Management in Airbnb's Key‑Value Store

December 10, 2025 · 4 min read

Maintainer

Overview

Mussel, a multi‑tenant key‑value store used by Airbnb evolved from static rate limiting to an adaptive, layered traffic‑management system to handle volatile workloads.

Problems with Static Rate Limiting

Static QPS‑based quotas caused issues:

Cost variance across requests
- It will treat 1 row scan and 1000 row scan as same always
Hot‑key and skew issues
Poor ability to adapt to traffic bursts

Key Improvements made

In order to make Adaptive Traffic Management some changes are done.

1. Resource‑Aware Rate Control (RARC)

Introduces Request Units (RU) based on rows, bytes, and latency
Token bucket at each dispatcher with certain rate of RU filling
Expensive operations cost more RU

2. Load Shedding with Criticality Tiers

Monitors real‑time latency ratios
Dynamic request penalization

Latency Ratio

Each dispatcher computes a latency ratio, defined as:

latency ratio = long-term p95 latency / short-term p95 latency

A healthy system has a ratio ≈ 1.0
A drop toward 0.3 indicates latency is rising quickly
When the ratio crosses the threshold:
- The dispatcher temporarily increases RU cost for certain client classes
- Their token buckets drain faster → they automatically back off
- If the ratio falls further, penalties expand to more classes until latency normalize.

The Control-Delay (CoDel) inspired Queue policy

The CoDel thread pool tackles the second hazard:
- queue buildup inside the dispatcher itself.
It monitors the time a request waits in the queue. If that sojourn time proves the system is already saturated, the request fails early, freeing up memory and threads for higher-priority work.

3. Hot‑Key Detection & DDoS Mitigation

Space‑Saving algorithm to track hot keys

Every dispatcher streams incoming keys into an in-memory top-k counter.
In just a few megabytes, it tracks approximate hit counts, maintains a frequency-ordered heap, and surfaces the hottest keys in real time in each individual dispatcher.
Algo used not read much in details.

Local LRU cache and request coalescing

When a key crosses the hot threshold, the dispatcher serves it from a process-local LRU cache.
Entries expire after roughly three seconds, so they vanish as soon as demand cools; no global cache is required.
A cache miss can still arrive multiple times in the same millisecond, so the dispatcher tracks in-flight reads for hot keys.
New arrivals attach to the pending future; the first backend response then fans out to all waiters.
In most cases only one request per hot key per dispatcher pod ever reaches the storage layer.

Key Takeaways

Measure real cost, not request count
Keep control loops local (single node)
- local cache
- codel queue
Continuous tuning needed as workloads evolve

Context : Resource unit (RU) calculation working and Load Shedding

Our Key-Value Store must decide whether to allow or throttle a request before the database executes it.
Since actual cost (rows, bytes, latency) is only known afterward, Mussel uses a predictive cost model.

RU is Estimated Using Request Shape

Before execution, the dispatcher knows:

Number of keys requested based on output schema requested
Query type (point lookup, multi-get, range scan)
body size for writes
Estimated number of rows based on key ranges
Request payload size

This metadata allows an upper-bound estimate of resource cost.

A Simple formula could be like

RU = α * estimated_rows + β * request_bytes + γ * expected_latency_factor;

Where:

α, β, γ are constants tuned using historical profiling & benchmarks
expected_latency_factor comes from past p95 for similar queries
estimated_rows comes from the query shape (range start/end)

So even before execution, the system can compute an estimated RU charge.

After DB runs the query, they sometimes adjust RU (but async)

The backend shard knows:

exact rows scanned
actual bytes read
actual CPU/latency

But they do NOT retro-adjust the user’s token bucket, because that would require cross-component coordination.

Instead:

backend reports metrics
system updates the cost model coefficients for future requests
So next time, the estimate becomes more accurate.

This is called an adaptive cost model.

load shedding

It is a resilience strategy where a system deliberately drops or rejects non-essential incoming requests or data during periods of high traffic or overload to protect core functionality, prevent crashes, and maintain low latency for critical operations, ensuring overall system stability and availability

👉 Airbnb Article Link

Uber Eats Image Deduping & Storage Recap

November 30, 2025 · 2 min read

Ayush Kumar Shukla

Maintainer

Precontext: Content Addressable Caching

Content-addressable caching (or content-addressable storage + caching) is a technique where the content itself determines the key used to store and retrieve it.
The core idea is Instead of identifying data by (URL, filename or ID) we use a hash of the content (like SHA-256, MD5, etc).
So the address = hash(content).

Why?

Uber Eats handles 100M+ images.
Many merchants upload identical product images means big duplicate storage.
Frequent updates can cause repeated downloads, processing, and CDN usage.
Goal: reduce storage cost, processing load, and latency.

Idea: Content‑Addressable Storage

we built a deduplication layer based on image hashes.

Three Metadata Maps

maps are usually caches but backed by DBs so survive restarts
main image sources is S3 like blob storage of uber itself

Map Name	Key	Value	Why
URL Map	Image URL	Hash of image	To avoid re-downloading the same external URL again; detects repeated URLs and checks whether the underlying image changed.
Original Image Map	Image Hash	Raw / original image	To deduplicate identical images uploaded via different URLs; many merchants may use same product image → only store one copy.
Processed Image Map	Image Hash + Processing Spec	Processed / resized image	To avoid re-processing the same image in different sizes/formats; store and reuse thumbnails, WebP versions, etc.

Processing Flow

Given (URL + processing spec), check URL map.
If URL seen → get hash; else download → hash → store.
Check Processed Image Map.
If processed variant exists → return cached version.
Else process → store → return.

Update Handling

Uses HTTP Last-Modified header to detect changed images.
If unchanged, skip re-download and reuse existing blobs.

Error Caching

Processing errors (e.g., corrupt image, too small) are also cached.
Prevents repeated unnecessary attempts.

Value

Latency improved: P50 ~100ms, P90 ~500ms.
less than 1% of weekly calls needed actual image processing.
Huge savings in storage + CDN + CPU usage.

Key learnings

Hash-based content-addressable storage eliminates duplication.
Separate raw images from processed variants for flexibility.
Use metadata maps for fast lookups.
Cache both successes and failures.
Use HTTP metadata to detect updates cheaply.

👉 Uber Article Link

Overview​

Problems with Static Rate Limiting​

Key Improvements made​

1. Resource‑Aware Rate Control (RARC)​

2. Load Shedding with Criticality Tiers​

Latency Ratio​

The Control-Delay (CoDel) inspired Queue policy​

3. Hot‑Key Detection & DDoS Mitigation​

Space‑Saving algorithm to track hot keys​

Local LRU cache and request coalescing​

Key Takeaways​

Context : Resource unit (RU) calculation working and Load Shedding​

RU is Estimated Using Request Shape​

After DB runs the query, they sometimes adjust RU (but async)​

load shedding​

Precontext: Content Addressable Caching​

Why?​

Idea: Content‑Addressable Storage​

Three Metadata Maps​

Processing Flow​

Update Handling​

Error Caching​

Value​

Key learnings​