Webhook-Driven Automation: Architecture Patterns That Actually Work
Webhooks are the foundation of event-driven automation. Here's how to receive them reliably, process them safely, and recover when things go wrong.
At Creative Codes, webhooks are the entry point for most of the automation pipelines we build. CRM events, payment notifications, form submissions, marketplace alerts: they all arrive as HTTP POST requests to an endpoint we control. Getting this right means your automation runs when it should and never drops an event.
What can go wrong with webhooks
The naive webhook handler is a function that receives the request, processes the payload synchronously, and returns 200. This works for demos. In production, it creates problems:
- Timeouts: if processing takes more than a few seconds, the sender will retry. You'll process the same event twice.
- Downstream failures: if your database is slow, your automation logic throws an exception, or a third-party API is down, the sender sees a 5xx error and retries. Again: duplicate processing.
- No visibility: when something goes wrong, you have no record of what arrived or what was done with it.
- No recovery: a crash mid-processing means the event is lost, or processed twice after retry.
Good webhook architecture separates receiving the event from processing it.
Pattern 1: Receive fast, process async
The webhook receiver has one job: validate the request, store the raw payload, and return 200 immediately. Processing happens in a background worker.
from fastapi import FastAPI, BackgroundTasks, HTTPException, Request
import hmac, hashlib, json
from datetime import datetime
app = FastAPI()
# In-memory for illustration — use Redis or a database in production
event_queue = []
@app.post("/webhooks/crm")
async def receive_crm_event(request: Request, background_tasks: BackgroundTasks):
body = await request.body()
# Validate signature before storing
signature = request.headers.get("X-Webhook-Signature", "")
if not verify_signature(body, signature, secret=WEBHOOK_SECRET):
raise HTTPException(status_code=401, detail="Invalid signature")
payload = json.loads(body)
event_id = store_raw_event(payload)
# Queue processing in the background — don't block the response
background_tasks.add_task(process_event, event_id, payload)
return {"received": True, "event_id": event_id}The sender gets a fast 200. Processing happens asynchronously. If processing fails, the raw event is already stored and can be replayed.
Pattern 2: Idempotency keys
Most webhook senders will retry on timeout or 5xx. You will receive the same event multiple times. Your processing logic must be idempotent: processing the same event twice should produce the same result as processing it once.
The standard approach is tracking which events have already been processed:
def process_event(event_id: str, payload: dict):
# Check if already processed
if is_already_processed(event_id):
return # Skip silently — this is a retry
try:
# Do the actual work
result = execute_automation_logic(payload)
# Mark as processed AFTER success
mark_as_processed(event_id, result)
except Exception as e:
log_processing_failure(event_id, str(e))
raiseFor n8n workflows, use the event ID as part of the de-duplication check in the first node. Store processed event IDs in your database or in Redis with a TTL that matches your sender's retry window (usually 24-72 hours).
Pattern 3: Signature verification
Every production webhook receiver must verify the request signature. Without this, anyone who discovers your webhook URL can inject fake events into your automation.
Most webhook senders (Stripe, GitHub, HubSpot, Shopify) provide HMAC-SHA256 signatures. The pattern is the same across providers:
def verify_signature(body: bytes, signature: str, secret: str) -> bool:
expected = hmac.new(
secret.encode(),
body,
hashlib.sha256,
).hexdigest()
# Use constant-time comparison to prevent timing attacks
return hmac.compare_digest(expected, signature)Some providers prefix the signature with a scheme identifier (e.g., sha256=abc123). Strip that prefix before comparing.
Pattern 4: Dead-letter queue
When processing fails after retries, the event goes to a dead-letter queue (DLQ) rather than being silently dropped. The DLQ holds events that couldn't be processed so they can be inspected, fixed, and replayed.
For n8n-based automation, we implement this as a separate n8n workflow that triggers on failure and writes to a dedicated "failed events" table in the database. For Python-based services, we use Redis pub/sub or a simple database table.
def handle_processing_failure(event_id: str, payload: dict, error: str, attempt: int):
if attempt < MAX_RETRIES:
# Schedule retry with exponential backoff
retry_delay = 2 ** attempt * 60 # 2min, 4min, 8min...
schedule_retry(event_id, payload, delay_seconds=retry_delay)
else:
# Move to DLQ
write_to_dead_letter_queue(event_id, payload, error)
send_slack_alert(f"Event {event_id} moved to DLQ after {attempt} attempts: {error}")Pattern 5: Observability
Every webhook you receive should produce a log entry with:
- Event ID
- Event type
- Timestamp received
- Processing status (queued, processing, completed, failed)
- Processing duration
This is table stakes for debugging production issues. "Why didn't my automation trigger?" has one of two answers: the event wasn't received, or the event was received but processing failed. Without logs, you can't tell which.
In n8n, use the execution log aggressively. Add a database write node early in the workflow to record that the webhook was received and is being processed. This creates a paper trail independent of n8n's own execution history.
Putting it together: n8n implementation
For our n8n-based automation builds, the webhook pattern looks like this:
- Webhook trigger node — receives the event, returns 200
- Signature verification (Function node or HTTP Request to validation service)
- De-duplication check (database lookup for event ID)
- Set node — normalize the payload to a consistent internal format
- Business logic — the actual automation steps
- Database write — record completion with outcome
- Error workflow — separate workflow triggered on failure, writes to DLQ and sends Slack alert
The key is that step 7 is a separate workflow, not error handling inside the main workflow. This ensures that failure in error handling doesn't swallow the original error.
If you're building webhook-driven automation and want it to handle retries, failures, and edge cases correctly, let's talk about the architecture.
Related: Building Production n8n Workflows: Architecture, Error Handling, Deployment
Related service
Need complex n8n workflows built to production standards?
n8n Automation Development →Related
We publish new posts every few weeks. See more on the insights page.