Rate Limiting
Rate limiting is a technique used to control the number of requests a user or service can make in a given time window. It helps prevent abuse, protects APIs, and ensures fair usage among users.
π‘ Why ?β
- Prevents DDoS attacks and excessive API calls
- Ensures fair resource allocation
- Maintains system stability and performance
Without rate limiting, a system might get overwhelmed, leading to slow response times or complete service outages.
βοΈ Fixed Window Counterβ
Ideaβ
- Divide time into fixed windows (like 1s each).
- Each request increments a counter.
- When the window resets, the counter resets to zero.
Prosβ
- Simple and easy to implement
- O(1) space and time
Consβ
- Allows burst traffic at window boundaries (the βboundary problemβ)
π Sliding Window Logβ
Ideaβ
This keeps a log of timestamps for each request and uses this to determine if a new request should be allowed.
Workingβ
- Keep a log of request timestamps.
- When a new request arrives, remove all entries older than the window size.
- Count the remaining entries.
- If the count is less than the limit, allow the request and add its timestamp to the log.
- If the count exceeds the limit, deny the request.
Prosβ
- Accurate per time window (no approximation)
- Easy to reason about
- Works well for low-volume APIs.
Consβ
- Uses more memory (O(N) timestamps)
- Slower under high request volume due to log cleanup
- Requires storing and searching through timestamps.
π Sliding Window Counterβ
Ideaβ
- This algorithm improves upon the Fixed Window Counter by reducing the burst problem at window boundaries.
- It does this by taking into account requests from both the current and previous windows and weighting them based on the time elapsed in the current window.
- This makes the rate limiting smoother and more consistent over time.
Workingβ
- Track the number of requests in the current and previous time windows.
- When a new request comes in:
- Determine how far into the current window the request arrived (the elapsed time fraction).
- Calculate a weighted sum:
weighted_requests = current_count + (1 - elapsed_fraction_of_current_window) * previous_count
- If the
weighted_requestsis less than the allowed limit, allow the request and increment the current window count. - If it exceeds the limit, deny the request.
- When the time moves into the next window, shift counts:
- The current window count becomes the old window count.
- Start a new window with a fresh counter.
Prosβ
- Smooths request rate
- Reduces bursts
- More accurate than Fixed Window Counter.
- More memory-efficient than Sliding Window Log.
- Smooths out edges between windows.
Consβ
- Slightly more complex to implement
πͺ£ Token Bucket Algorithmβ
Ideaβ
Tokens represent permission to make requests.A fixed number of tokens are added at fixed rate. Each request consumes one token.
Workingβ
- There is bucket that holds tokens and has a maximum capacity of tokens.
- Tokens are added to the bucket at a fixed rate (e.g., x tokens per second).
- When a request arrives, it must obtain a token from the bucket to proceed.
- If there are enough tokens, the request is allowed and tokens are removed.
- If there aren't enough tokens, the request is dropped.
Prosβ
- Allows short bursts (requests up to the bucket's capacity, accommodating short-term spikes)
- Simple and efficient
Consβ
- Memory usage scales if implemeted per user.
π§ Leaky Bucketβ
Ideaβ
Imagine a bucket that leaks water at a fixed rate. Requests add water; if it overflows β reject.
Prosβ
- Smooths traffic perfectly
- Prevents sudden spikes
Consβ
- Fixed outflow rate might delay urgent requests
βοΈ Choosing the Right Algorithmβ
Each rate-limiting algorithm has its own strengths and trade-offs. The best choice depends on your use case, scalability needs, and fairness requirements.
| Algorithm | Accuracy | Memory Usage | Allows Bursts | Smooth Traffic | Implementation Complexity | Best For |
|---|---|---|---|---|---|---|
| Fixed Window Counter | Low | Very Low | Yes | β No | β Very Simple | Quick prototypes, low-traffic systems |
| Sliding Window Log | β High | High | β οΈ Limited | β Yes | βοΈ Moderate | Precise API rate limiting per user |
| Sliding Window Counter | βοΈ Approximate | Low | β οΈ Slight | β Yes | βοΈ Moderate | Smooth rate limiting for production APIs |
| Token Bucket | β High | Low | β Yes | β οΈ Partial | β Common | General-purpose API rate limiting |
| Leaky Bucket | β High | Low | β No | β Perfect | βοΈ Moderate | Network traffic shaping, message queues |
- Smooth Traffic means request are spread out evenly over time
π§© Practical Considerationsβ
When implementing rate limiting in production systems, consider the following:
Distributed Environmentβ
If your API is served by multiple instances, in-memory counters will not synchronize. Use a shared store such as Redis, Memcached, or etcd for consistent limits across nodes.
Key-based Rate Limitingβ
Limit requests based on a unique key, such as:
- User ID
- IP Address
- API Key
- Endpoint Path
This ensures that one abusive client doesnβt affect all others.
Graceful Denialβ
Instead of simply rejecting requests, return:
- HTTP 429 Too Many Requests
- Include a Retry-After header to tell clients when to retry
Monitoring and Loggingβ
Track metrics such as:
- Number of requests denied
- Requests per client
- Rate-limit rule violations