High level design

Estimation

Requirement Gathering
- Functional
- Generate long to short url
- Redirection
- Support link expiration
- Clicks monitoring (Optional)
- Allow aliasing (Optional)
- Non Functional
- High availability (99.9%)
- Scalability
Deep dive into components
URL generation service
- It will handle generation of unique URLs.
- Handle collisions.
- Approach 1 for URL generation : Hashing and Encoding
- Convert
long_url into a hash using MD5 or SHA-256
- Encode that hash into URL friendly
Base62.
- Not take few bytes to represent short url.
- Collision handeling
- Make DB query before hand
- Make short_url as unique and rely on the DB contraint to fail and on failure append some increment suffix(like
-1,-2).
- Approach 1 for URL generation : Incremental IDs
- instead of hashing we can rely on Database incremental ids.
- Convert id into url friendly using base62.
- Consideration
- DB dependency can be bottleneck in scaling
- Harder for distributed systems to implement
- Custom Alias
- This we can allow based on whether same alias is present in DB or not.
- Validate character in alias to be URL frendly
- if alias already taken return correct errors
- Link expiration
- Can be done using running a
cron job, which removed expired links from DB
- During the redirection process service can check if link is expired return to a default page and show error.
Redirection service
- DB lookup and then redirection
- send HTTP redirect response (301)
- For performance can cache in-memory.
Analytics Service
- Use a message Queue like Kafka to track each event click.
- Batch process agreegation of data.
Key Issues and Bottlenecks
Scalability
- Deploy
API layer across multiple instances behind load-balancer to distrivute incoming requests evenly.
- Sharding
- If we are using auto increment id then we can shard based on range.(0-10k,...)
- Other case we can use Hash based sharding. When adding more shards we may need to use consistent hashing to handle.
- Caching
- store frequently accessed urls.
Availability
- DB replications (Single master)
- Geo distributed Deploy
Security
- Rate limiting
- input validation
- HTTPS