Canonical Logging vs Traditional Logging
1. Core Conceptsβ
Traditional Loggingβ
- Free-form logs (text or inconsistent JSON)
- No fixed schema
- Hard to query and correlate
Example:β
User login attempt: ayush@gmail.com
Password matched for user 123
Login successful
Structured Loggingβ
- Logs in JSON / key-value format
- Machine-readable
- Still inconsistent across services
Example (Structured but NOT Canonical):β
{ "msg": "login success", "userId": "123" }
{ "event": "LOGIN", "uid": "123" }
π Problem: Different field names β hard to query
Canonical Loggingβ
- Structured logging + standardized schema
- Often includes one final summary log per request
- Acts as source of truth
2. Key Differenceβ
- Structured Logging = JSON logs
- Canonical Logging = JSON logs + consistent schema across all services
3. Stripe Insight (Important)β
- Multiple debug logs during execution
-
- One canonical log line per request per service
π This log is:
- Dense
- Aggregated
- Query-friendly
4. Example: Full Flow Comparisonβ
β Traditional Logs (scattered)β
Request started /login
Auth success user=123
DB query users table
Response 200
π To debug β need to correlate manually
β Canonical Log (single summary)β
{
"timestamp": "2026-03-29T10:15:30Z",
"service": "auth-service",
"event": "USER_LOGIN",
"user_id": "123",
"status": "SUCCESS",
"http_status": 200,
"db_calls": 1,
"cache_hit": false,
"duration_ms": 120,
"trace_id": "abc-xyz"
}
π Everything in one place β
5. What Goes Into a Canonical Logβ
- Request metadata β method, path
- User info β user_id
- System metrics β duration, DB calls
- Identifiers β trace_id
- Status β success/failure
6. Why Canonical Logs Existβ
Problem:β
- Data spread across multiple logs
- Queries require correlation
Solution:β
- One wide event (all info in one log)
7. Benefits (with Examples)β
π Query Simplicityβ
Without canonical:β
- Find logs
- Join by
trace_id
With canonical:β
status = "FAILURE" AND event = "USER_LOGIN"
β‘ Faster Debuggingβ
Example:
{
"event": "PAYMENT",
"status": "FAILURE",
"error": "INSUFFICIENT_FUNDS"
}
π Root cause visible instantly
π Ad-hoc Analysisβ
Example:
avg(duration_ms) GROUP BY event
π Works because schema is consistent
π§© Bridge Between Logs & Metricsβ
- Metrics: predefined (fast)
- Logs: flexible (slow)
- Canonical logs: flexible + queryable
8. Canonical Logging Pipelineβ
Service β Canonical Log β Kafka β Data Warehouse β Dashboard
9. Canonical Logs vs Metrics vs Tracesβ
| Type | Example | Use |
|---|---|---|
| Metrics | login_count=100 | Fast monitoring |
| Logs | βuser logged inβ | Debugging |
| Canonical Logs | full JSON summary | Query + analytics |
| Tracing | spans per service | Deep debugging |
10. Can Canonical Logs Replace Traditional Logs?β
β Not fullyβ
Example Problem:β
{
"event": "LOGIN",
"duration_ms": 1200
}
π You know itβs slow π But not why
11. Step Logs Inside Canonicalβ
Your Idea:β
{
"event": "LOGIN",
"steps": [
{ "name": "auth", "time": 50 },
{ "name": "db", "time": 900 }
]
}
Pros:β
- Good for simple breakdown
Cons:β
- Large logs
- Hard to query nested data
- Doesnβt scale
12. Best Practice (Industry)β
Use 3 Layers:β
1. Canonical Log (Mandatory)β
{
"event": "LOGIN",
"duration_ms": 120,
"status": "SUCCESS"
}
2. Debug Logs (Selective)β
DB query took 900ms
Cache miss occurred
3. Distributed Tracingβ
- Shows full flow with timings
13. Cost Trade-offβ
Extra cost:β
- More logs
- Storage & ingestion
Mitigation:β
- Sampling (log 1% success)
- Keep all failures
- Short retention for debug logs
14. Key Design Principlesβ
- Emit canonical log at end of request
- Always emit (even on failure)
- Keep schema:
- flat
- consistent
- stable
15. Final Mental Modelβ
- Canonical log = summary
- Debug logs = details
- Tracing = deep visibility