Temporal Architecture
Core Problem Solved
-
Building reliable distributed systems in modern microservices is complex. Only 20–30% of engineering effort goes to pure business logic. The remaining 70–80% is spent on defensive plumbing and boilerplate, including:
- Managing database transaction states.
- Handling flaky third-party network timeouts and scheduling backoff retry loops.
- Maintaining complex state machines and distributed lock managers.
- Writing manual database polling cron jobs to catch stuck processes.
-
Temporal solves the Distributed Systems Reliability problem by providing Durable Execution. It hides infrastructure, letting developers write long-running distributed operations in normal languages as if they ran inside a single, crash-proof function.
Core Concepts & Technical Philosophy
Temporal follows a Code-over-Configuration paradigm. Instead of JSON/YAML configs, it uses idiomatic code to define state machines.
Workflows
A Workflow is an orchestration blueprint written in standard code (TypeScript, Go, Java, Python).
- The Golden Rule: Workflows must be completely deterministic. Because Temporal replays code to recover, running a workflow multiple times with the same inputs must yield the same execution path.
- Banned Operations: Direct API calls, database queries, reading system time (Date.now()), or generating random values (Math.random()) are not allowed inside a workflow.
Activities
An Activity is a non-deterministic function where side effects occur. Put external API calls, DB queries, or disk I/O here. Activities can fail, time out, and retry independently of the workflow.
High-Level System Architecture & Flow
Temporal uses an Isolated Worker Model. The Temporal Server is only the orchestrator, state manager, and queue coordinator — it never runs your code. Your code runs on your infrastructure (Kubernetes, EC2, etc.) inside a Worker process.
Core Architecture Component Flow

Operational Mechanics: The Code Implementation
Defining Activities (activities.ts)
import axios from 'axios';
export async function chargeCreditCard(
amount: number,
orderId: string,
): Promise<string> {
const response = await axios.post(
'https://api.payments.com/v1/charges',
{ amount, orderId },
{
headers: {
// Idempotency key prevents double-charging during worker retry loops
'Idempotency-Key': orderId,
},
},
);
return response.data.transactionId;
}
export async function updateDatabase(
orderId: string,
status: string,
): Promise<void> {
console.log(`Updating DB for order ${orderId} to status: ${status}`);
}
Defining the Deterministic Workflow (workflows.ts)
import { proxyActivities } from '@temporalio/workflow';
import type * as activities from './activities';
// proxy intercepts calls and records intent to the Temporal Server
const { chargeCreditCard, updateDatabase } = proxyActivities<typeof activities>(
{
startToCloseTimeout: '1 minute',
retryPolicy: {
initialInterval: '2 seconds',
backoffCoefficient: 2,
maximumAttempts: 5,
},
},
);
export async function orderProcessingWorkflow(
amount: number,
orderId: string,
): Promise<string> {
// `await` blocks until the activity reports success back to the server
const txId = await chargeCreditCard(amount, orderId);
// This line runs only after payment has cleared
await updateDatabase(orderId, 'PAID');
return `Order ${orderId} fully processed. Tx: ${txId}`;
}
Architectural Q&A Deep Dive
Q1: If the Temporal Server goes down, how does it know where a workflow left off when it restarts?
Answer: Temporal utilizes an immutable, append-only Event History database instead of taking traditional system state memory snapshots.
When a server reboots after a crash, it runs a process called Replay:
- The recovered server assigns the incomplete workflow to an active Worker.
- The Worker downloads the workflow's event history from the database.
- The Worker re-executes the workflow code from the start.
- When the code hits an
await, the SDK checks the event log. If an entry exists showing the activity already completed, it suppresses the call and injects the recorded result. - This continues until the code reaches a point with no recorded event, resuming from that spot.
Q2: If the system crashes mid-API call, how do we guarantee actions are not executed twice?
Answer: The protection depends on where the failure occurs:
- Scenario A (Server Crash, Worker Alive): The Worker receives HTTP 200 from Stripe but the server is down. The Worker buffers the response and retries the handshake until the server recovers. No duplicate call is made.
- Scenario B (Worker Crash Mid-Flight): The Worker dies before getting a response. The server uses
startToCloseTimeoutto mark the attempt failed and re-queues the activity for retry. - Counter-Measure: Temporal can't undo outbound packets that already left the OS. For exact-once behavior, use Idempotency Keys (e.g.,
orderId). If a retry occurs, Stripe will ignore the duplicate key and return the original receipt.
Q3: If I scale horizontally across multiple instances, does the workflow stay locked to the server that initiated it?
Answer: No. Temporal uses an elastic Pull Model (queue-based), not hard routing to a host.
If Server A starts a workflow, it only triggers an event. The Matching Service enqueues the task. Any other server can pull and process the task. State lives in the event ledger, not on any host.
Q4: Where does the Task Queue live, and how does Temporal ensure two workers don't grab the same task?
Answer: Task queues are virtual structures kept in the Matching Service and persisted in the database.
-
To avoid two workers grabbing the same task, Temporal uses Optimistic Concurrency Control (OCC) and atomic row updates.
-
Before sending a task, the server atomically updates the row from
QUEUEDtoRUNNING:UPDATE tasks SET status = 'RUNNING', worker_id = 'Server-B'
WHERE task_id = X
AND
status = 'QUEUED';- If Server B's transaction succeeds, it gets the task. Server C's concurrent update then fails the
WHERE status = 'QUEUED'check and receives no payload, so it keeps polling.
- If Server B's transaction succeeds, it gets the task. Server C's concurrent update then fails the
Temporal Use Cases
- Subscription Billing Engines: Managing users' 30-day billing cycles, handling grace periods, failed payment retries, and updating subscription tiers over months or years.
- Complex Sagas / Distributed Transactions: Booking a vacation that involves reserving a flight, a hotel, and a car rental. If the car rental fails, Temporal flawlessly executes custom compensating actions (refunds) for the flight and hotel.
- CI/CD & Resource Provisioning: Orchestrating the complex deployment pipelines of cloud platforms (HashiCorp uses Temporal heavily for Terraform Cloud).
- Massive Distributed Cron Jobs: Managing thousands of background jobs that require tracking state, complex retries, and dynamic scheduling.