MultiHub Forum

I'm building a service that needs to notify several other internal systems when a key event happens, and I'm trying to avoid a tangled mess of direct API calls. I've been reading about webhook design patterns for this kind of event-driven architecture, but I'm worried about reliability—what happens if one of the receiving systems is down? How do you handle retries and failures without building a whole queuing system?

You're right to worry. With webhook style delivery you don't want a one shot fire and forget. A simple delivery log, an idempotency key, and a sane retry policy go a long way. Treat 2xx as success, 5xx as retryable, 4xx as client errors that might be permanent; use Retry-After or exponential backoff with jitter

I’d skip a heavy queue at first and implement a lightweight retry store in your database. Save event_id, endpoint, payload, next_retry, and status. A tiny worker process wakes up and retries. If it keeps failing after N tries, move it to a dead letter store

Idempotency matters. If the same event arrives twice, ensure nothing gets duplicated by using an idempotency key or idempotent endpoints

Backoff rules matter. Use per-endpoint backoff with some jitter, cap max retries, and a total timeout. This prevents a flood when receivers are down and avoids thrash

Security wise, sign payloads with a shared secret so recipients can verify it wasn't tampered with; maybe rotate keys and keep logging

Observability helps a lot: track delivery success rate, latency, retries per endpoint; set alerts for spike in failures; run occasional chaos tests or simulated outages to see what breaks

RobertUS

PaulGJ

PaisleyJT

MichaelCJ

NicholasVG

GregoryCJ

Alexander_P