Content Delivery Network
Requirements
Functional
- Serve static content(JS, CSS, images, videos) efficiently
- Reduce latency for users across diff geo locations
- provide caching, LB, and failover mechanism
- Support realtime content invalidation and failover mechanism
- Write operation for dynamic content
- Efficient propagation of new Content, with cache invalidation(if file is updated then new cache should update immediately)
Non-Functional
- Scalable
- Low latency
- Security
Capacity Estimation
- Total users - 100M
- Average content size is 500KB
- read/write ratio : 100 : 1
Traffic
| Desc | value |
|---|---|
| QPS(read) | 100M*(246060) = 1200 approx |
| QPS(write) | 12 |
Storage
| Desc | value |
|---|---|
| storage per day | 500KBx1M = 500 GB (excluding replication) |
| storage for 1 year | 181 TB |
Memory(Cache)
If we follow 80:20 rule for caching
| Desc | value |
|---|---|
| storage per day | 500KBx100M = 50 TB*0.8 = 40 TB |
High Level Design
Edge Servers (Point of Presence)
- It is a server which a availabe near to the user
- It caches the data and reduce load on origin server by fullfilling request at own
- If content not found then request Origin server
Origin Servers
- It is the main server, Original storage for the content
- If cache miss happens at the Edge server then content served from here
- Handles content upload
Global Traffic Manager (Geo-DNS Load balancer)
- It routes the user request nearest and least loaded edge server
- Handles the failure by if one edge server down the redirect to other edge server
- Balance load across all location
- Based on geographic proximity routes the user to nearest edge server, few approah can be used
- GeoDNS - based on IP
- Anycast routing - network itself handles nearest node based on BGP(Border gateway protocol)
Multi-tier Caching
- Reduce the latency by stroring content at diff cahcing layers like
- Client side browser cache
- Edge caches(POPs)
- Regional Cache - used when pop cache miss but before reaching origin
- Origin Cache
- Cache Invalidation
- TTL
- Event based (on pull remove old)
- Versioning - use diff urls for updated content
Content Replication
- Ensure every update to origin is propogated across all cdn
- Mehods
- Push
- Pull
- Hybrid - push popular content and pull less popular
Security
- Avoid DDos by rate-limiting
- Token based auth for provate content
- SSL/TLS encryption for secure delivery
Read flow
Write flow
DB Design
Content metadata (Not clear still)
Stores metadata about CDN-stored content.
CREATE TABLE content_metadata (
id SERIAL PRIMARY KEY,
url TEXT UNIQUE NOT NULL, -- CDN URL for the content
origin_url TEXT NOT NULL, -- Original source URL
ttl INT DEFAULT 3600, -- Time-to-live in seconds
size BIGINT, -- Size of content in bytes
version INT DEFAULT 1, -- Versioning for updates
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);
API Design
Get data from cdn
GET /cdn/content/{content_id}
Response: {"cdn_url": "<cdn_url>,"cache_hit": true}
Upload to cdn
POST /cdn/upload
body : {"file":binary}
auth token
Response: {"status":OK, "cdn_url": "<cdn_url>,"content_id": string}
Update content (Invalidation)
PUT /cdn/content/{content_id}
body : {"file":binary,"version":v2}
auth token
Response: {"status":OK, "version": "v2","content_id": string}
PENDING
- DB design