Skip to main content

๐Ÿ“ธ Database Snapshots

๐Ÿง  What is a Snapshot?โ€‹

A snapshot is a read-only, point-in-time view of the database.

  • It is NOT a full copy (in modern systems)
  • It is just a logical view based on data versions

๐Ÿค” Why Snapshots Existโ€‹

1. Backup & Recoveryโ€‹

  • Restore database after failure

2. Analyticsโ€‹

  • Run heavy queries without affecting production

3. Replicationโ€‹

  • Create replicas quickly

4. Testing / Debuggingโ€‹

  • Reproduce past state

5. Point-in-Time Recoveryโ€‹

  • Restore DB to exact timestamp

โš™๏ธ How Snapshots Work Internally (MVCC)โ€‹

Instead of copying data, databases store multiple versions of rows.

Each row has metadata:

  • xmin โ†’ when version was created
  • xmax โ†’ when version became invalid

Exampleโ€‹

Valuexminxmax
A=10T1T2
A=100T2โˆž

Snapshot Ruleโ€‹

A snapshot reads rows where:

xmin โ‰ค snapshot_time < xmax

๐Ÿ‘‰ Snapshot = just a timestamp + visibility rules


๐Ÿ“ฆ How Data is Storedโ€‹

Database stores versions of data, NOT snapshots

Storage looks like:

[Row1: A=10, xmin=T1, xmax=T2]
[Row2: A=100, xmin=T2, xmax=โˆž]

๐Ÿ” Multiple Snapshotsโ€‹

  • Many snapshots can exist at the same time
  • Each query/transaction gets its own snapshot

Example:

SnapshotValue Seen
S1A = 10
S2A = 100
CurrentA = 200

๐Ÿงน Garbage Collection (GC)โ€‹

Old versions are deleted when:

  • No active snapshot needs them

Flowโ€‹

  1. Data updated โ†’ new versions created
  2. Old snapshots finish
  3. GC removes unused versions

๐Ÿ“ˆ Storage Growthโ€‹

Case 1: Few updatesโ€‹

  • when some rows or DB is updated
100GB DB + 10GB changes = 110GB

Case 2: Full DB updatedโ€‹

  • when almost all rows present in data is updated
100GB old + 100GB new = 200GB

๐Ÿ‘‰ Temporary 2x storage possible


โš ๏ธ Storage Does NOT Grow Foreverโ€‹

Because:

  • Old versions are cleaned by GC

Final storage:

โ‰ˆ Current DB size

๐Ÿ”ด Real-World Issue: Bloatโ€‹

If:

  • Long-running queries exist
  • Snapshots live too long

๐Ÿ‘‰ Old data cannot be deleted

Result:

  • Storage increases
  • Performance degrades

๐Ÿ’พ Backup Using Snapshotsโ€‹

Step 1: Take Snapshot (Base Backup)โ€‹

A = 10, B = 20

Step 2: Record Logs (WAL)โ€‹

A โ†’ 100
B โ†’ 200

Step 3: Store Bothโ€‹

  • Snapshot
  • Logs

๐Ÿ” Restore Processโ€‹

  1. Load snapshot
  2. Replay logs

Final state:

A = 100
B = 200

๐Ÿง  Key Insights (Must Remember)โ€‹

  • Snapshot is not stored as full data
  • It is a logical view over versions
  • Database stores multiple versions of rows
  • Old versions are cleaned via garbage collection
  • Storage may temporarily grow but stabilizes