Skip to main content

2 posts tagged with "Backend"

View All Tags

Adaptive Traffic Management in Airbnb's Key‑Value Store

· 4 min read

Overview

Mussel, a multi‑tenant key‑value store used by Airbnb evolved from static rate limiting to an adaptive, layered traffic‑management system to handle volatile workloads.

Problems with Static Rate Limiting

Static QPS‑based quotas caused issues:

  • Cost variance across requests
    • It will treat 1 row scan and 1000 row scan as same always
  • Hot‑key and skew issues
  • Poor ability to adapt to traffic bursts

Key Improvements made

  • In order to make Adaptive Traffic Management some changes are done.

1. Resource‑Aware Rate Control (RARC)

  • Introduces Request Units (RU) based on rows, bytes, and latency
  • Token bucket at each dispatcher with certain rate of RU filling
  • Expensive operations cost more RU

2. Load Shedding with Criticality Tiers

  • Monitors real‑time latency ratios
  • Dynamic request penalization

Latency Ratio

Each dispatcher computes a latency ratio, defined as:

latency ratio = long-term p95 latency / short-term p95 latency
  • A healthy system has a ratio ≈ 1.0

  • A drop toward 0.3 indicates latency is rising quickly

  • When the ratio crosses the threshold:

    • The dispatcher temporarily increases RU cost for certain client classes
    • Their token buckets drain faster → they automatically back off
    • If the ratio falls further, penalties expand to more classes until latency normalize.

The Control-Delay (CoDel) inspired Queue policy

  • The CoDel thread pool tackles the second hazard:
    • queue buildup inside the dispatcher itself.
  • It monitors the time a request waits in the queue. If that sojourn time proves the system is already saturated, the request fails early, freeing up memory and threads for higher-priority work.

3. Hot‑Key Detection & DDoS Mitigation

Space‑Saving algorithm to track hot keys

  • Every dispatcher streams incoming keys into an in-memory top-k counter.
  • In just a few megabytes, it tracks approximate hit counts, maintains a frequency-ordered heap, and surfaces the hottest keys in real time in each individual dispatcher.
  • Algo used not read much in details.

Local LRU cache and request coalescing

  • When a key crosses the hot threshold, the dispatcher serves it from a process-local LRU cache.
  • Entries expire after roughly three seconds, so they vanish as soon as demand cools; no global cache is required.
  • A cache miss can still arrive multiple times in the same millisecond, so the dispatcher tracks in-flight reads for hot keys.
  • New arrivals attach to the pending future; the first backend response then fans out to all waiters.
  • In most cases only one request per hot key per dispatcher pod ever reaches the storage layer.

Key Takeaways

  • Measure real cost, not request count
  • Keep control loops local (single node)
    • local cache
    • codel queue
  • Continuous tuning needed as workloads evolve

Context : Resource unit (RU) calculation working and Load Shedding

  • Our Key-Value Store must decide whether to allow or throttle a request before the database executes it.
  • Since actual cost (rows, bytes, latency) is only known afterward, Mussel uses a predictive cost model.

RU is Estimated Using Request Shape

Before execution, the dispatcher knows:

  • Number of keys requested based on output schema requested
  • Query type (point lookup, multi-get, range scan)
  • body size for writes
  • Estimated number of rows based on key ranges
  • Request payload size

This metadata allows an upper-bound estimate of resource cost.

A Simple formula could be like

RU = α * estimated_rows + β * request_bytes + γ * expected_latency_factor;

Where:

  • α, β, γ are constants tuned using historical profiling & benchmarks

  • expected_latency_factor comes from past p95 for similar queries

  • estimated_rows comes from the query shape (range start/end)

So even before execution, the system can compute an estimated RU charge.

After DB runs the query, they sometimes adjust RU (but async)

The backend shard knows:

  • exact rows scanned
  • actual bytes read
  • actual CPU/latency

But they do NOT retro-adjust the user’s token bucket, because that would require cross-component coordination.

Instead:

  • backend reports metrics
  • system updates the cost model coefficients for future requests
  • So next time, the estimate becomes more accurate.

This is called an adaptive cost model.

load shedding

  • It is a resilience strategy where a system deliberately drops or rejects non-essential incoming requests or data during periods of high traffic or overload to protect core functionality, prevent crashes, and maintain low latency for critical operations, ensuring overall system stability and availability

👉 Airbnb Article Link

Uber Eats Image Deduping & Storage Recap

· 2 min read

Precontext: Content Addressable Caching

  • Content-addressable caching (or content-addressable storage + caching) is a technique where the content itself determines the key used to store and retrieve it.
  • The core idea is Instead of identifying data by (URL, filename or ID) we use a hash of the content (like SHA-256, MD5, etc).
  • So the address = hash(content).

Why?

  • Uber Eats handles 100M+ images.
  • Many merchants upload identical product images means big duplicate storage.
  • Frequent updates can cause repeated downloads, processing, and CDN usage.
  • Goal: reduce storage cost, processing load, and latency.

Idea: Content‑Addressable Storage

we built a deduplication layer based on image hashes.

Three Metadata Maps

  • maps are usually caches but backed by DBs so survive restarts
  • main image sources is S3 like blob storage of uber itself
Map NameKeyValueWhy
URL MapImage URLHash of imageTo avoid re-downloading the same external URL again; detects repeated URLs and checks whether the underlying image changed.
Original Image MapImage HashRaw / original imageTo deduplicate identical images uploaded via different URLs; many merchants may use same product image → only store one copy.
Processed Image MapImage Hash + Processing SpecProcessed / resized imageTo avoid re-processing the same image in different sizes/formats; store and reuse thumbnails, WebP versions, etc.

Processing Flow

https://lh4.googleusercontent.com/KQRWRxdD8P4xiDcARfsjRiBah_FHtja7sJ8m65BJF3s-g_98cZWn4uR9I3iF0-LnXvIePcfn2SJC5hDo33gRG71kgGszq70iEZ18KbBH1JFSEMh7swlAw9-Q0x6WFU0yP80iHR0g2lw1-RkElD0niYpZO-UXOS1oDYM5onTJ91pzevoXiiQkyyU-

  1. Given (URL + processing spec), check URL map.
  2. If URL seen → get hash; else download → hash → store.
  3. Check Processed Image Map.
  4. If processed variant exists → return cached version.
  5. Else process → store → return.

Update Handling

https://lh5.googleusercontent.com/pg05FCtFB0we0WDHiLzklT2AUEDmfTIC1K1YvOxT5KjHRJWtukFt0TZgxDN97qgyLN-cESclFz3TwD40Ag_KkMBKGaHo7h0vSpxWErffoFNfCQyY5KevlMHYzuVf9k99wgFhqaCjp22Oe-8ln7WNs3Y_eUAULK5ohGwL6MGkdbUI14SgytVuiI_3

  • Uses HTTP Last-Modified header to detect changed images.
  • If unchanged, skip re-download and reuse existing blobs.

Error Caching

  • Processing errors (e.g., corrupt image, too small) are also cached.
  • Prevents repeated unnecessary attempts.

Value

  • Latency improved: P50 ~100ms, P90 ~500ms.
  • less than 1% of weekly calls needed actual image processing.
  • Huge savings in storage + CDN + CPU usage.

Key learnings

  • Hash-based content-addressable storage eliminates duplication.
  • Separate raw images from processed variants for flexibility.
  • Use metadata maps for fast lookups.
  • Cache both successes and failures.
  • Use HTTP metadata to detect updates cheaply.

👉 Uber Article Link