Webhook Debugging Checklist: Fix Delivery Issues Fast

What is a webhook debugging checklist?

A webhook debugging checklist is a repeatable workflow for finding why a webhook request failed, was delayed, or was processed incorrectly. It helps you separate provider issues from problems in your webhook endpoint, reverse proxy, load balancer, DNS, TLS/SSL certificate, firewall, or application code.

A webhook usually arrives as an HTTP request with headers, a JSON payload, and a signature such as an HMAC signature created with a shared secret. If any part of that chain breaks, you may see non-2xx responses, timeouts, retries, duplicate deliveries, or events that arrive out of order.

This guide covers Stripe webhooks, GitHub webhooks, Slack webhooks, Shopify webhooks, and Twilio webhooks, and it applies whether you are running on AWS, Nginx, Kubernetes, Node.js, Python, Express, or FastAPI.

Quick answer: how do you debug a webhook request?

Start with the simplest question: did the provider send the event, and did your endpoint return a successful 2xx response quickly enough?

Check the provider delivery log or request inspector for the exact request, response code, latency, retries, and event ID.
Confirm the request reached your infrastructure by checking DNS, TLS/SSL certificate status, firewall rules, reverse proxy logs, and load balancer logs.
Verify the headers, content-type, and JSON payload match what your handler expects.
Validate the HMAC signature with the shared secret before any business logic runs.
Check application logs, metrics, and distributed tracing for errors, slow dependencies, or queue backlogs.

If the provider shows a 4xx response, the request was usually rejected by your app or proxy. If it shows a 5xx response, your server or a downstream dependency likely failed. If it shows a timeout, the endpoint probably took too long to respond or the network path was interrupted.

What should I check first when a webhook fails?

Check these items first, in order:

Provider delivery status: confirm the event was actually sent and whether retries are happening.
HTTP status code: look for 2xx responses, 4xx responses, or 5xx responses.
Latency: if the endpoint is slow, the provider may retry even when the handler eventually succeeds.
Recent changes: review deploys, secret rotations, DNS updates, TLS/SSL certificate renewals, firewall changes, or reverse proxy changes.
Event ID and correlation ID: use them to match the provider record to your logs and traces.

This first pass often tells you whether the problem is with your webhook endpoint, your infrastructure, or the provider’s delivery attempt.

Why do webhook deliveries fail with non-2xx responses?

Most webhook providers treat anything outside the 2xx range as a failure. A 4xx response usually means the request was malformed, unauthorized, or rejected by validation. A 5xx response usually means your server crashed, threw an exception, or could not reach a dependency.

Common causes include invalid or expired HMAC signatures, missing headers, wrong content-type, malformed JSON payloads, authentication failures, application exceptions, reverse proxy or load balancer misconfiguration, TLS/SSL certificate problems, and firewall blocks.

Some providers retry after non-2xx responses, which can make one underlying issue look like many separate failures. Logging, metrics, and distributed tracing help you see whether the same event ID is failing repeatedly or whether multiple issues are happening at once.

How do I verify a webhook signature?

Signature verification usually works like this:

Read the raw request body before any framework modifies it.
Collect the required headers, including the signature header and timestamp header if the provider uses one.
Recompute the HMAC signature with the shared secret.
Compare the computed value with the signature sent by the provider.
Reject the request if the signature is missing, stale, or invalid.

Be careful with frameworks that parse JSON before you verify the signature. In Express and FastAPI, for example, you often need access to the raw body so the computed HMAC signature matches the provider’s value exactly.

If the signature fails, check for body mutation, whitespace changes, encoding differences, clock skew, or a rotated shared secret that was not updated everywhere.

Why are webhook events duplicated?

Duplicate webhook events usually happen because providers retry after a timeout, a network error, or a non-2xx response. In some systems, the provider may also intentionally send the same event more than once to ensure delivery.

The best defense is idempotency and deduplication:

store the event ID in a database table with a unique constraint
ignore events you have already processed
make side effects safe to repeat

A queue and background worker are useful here because the webhook endpoint can acknowledge the request quickly while the worker handles the actual business logic. If the worker fails, you can retry from the queue without asking the provider to resend the event.

Why do webhook events arrive out of order?

Webhook events can arrive out of order because providers may send them in parallel, retry one event while another succeeds, or route deliveries through different infrastructure paths. Network latency, queueing, and downstream processing time can also change the order in which your system observes events.

To handle this safely:

treat event order as unreliable unless the provider explicitly guarantees ordering
use event timestamps and event IDs to reconstruct the timeline
store the latest known state instead of assuming every event is sequential

This is especially important for systems like billing, shipping, and ticketing, where a later event may arrive before an earlier one.

How do I reproduce a webhook issue locally?

Use a captured request and replay it in a controlled environment:

cURL for raw HTTP replay
Postman for manual inspection and header editing
ngrok to expose a local server to the internet
Pipedream to inspect, transform, and replay webhook requests

To reproduce the issue accurately, keep the same headers, JSON payload, content-type, and signature timestamp. If the provider signs the raw body, do not reformat the JSON before replaying it.

A good local reproduction often reveals whether the bug is in your code, your framework, or the provider’s delivery path.

What logs should I check for webhook debugging?

Check logs at every layer that can affect delivery:

provider delivery logs
reverse proxy logs
application logs
queue and background worker logs
dead-letter queue entries
observability tools such as Datadog, Sentry, and OpenTelemetry traces

Look for the event ID, correlation ID, response code, latency, retry count, and any exception stack traces. If you use distributed tracing, follow the request from the edge through the webhook endpoint and into the background worker.

If logs are missing, add structured logging around request receipt, signature verification, queue enqueueing, worker execution, and final side effects.

How fast should a webhook endpoint respond?

A webhook endpoint should respond as quickly as possible, ideally with a 2xx response after validating the request and before doing expensive work. In practice, that means keeping the handler lightweight and moving slow tasks to a queue and background worker.

Slow responses increase latency and can trigger retries, even if the eventual business action succeeds. If your endpoint needs to call a database, external API, or internal service, do that after the response or in an asynchronous worker.

The exact timeout threshold depends on the provider, but the safe rule is simple: acknowledge fast, process later.

What is the best way to make webhook handlers idempotent?

The best approach is to make every side effect safe to repeat.

Use these patterns:

store the event ID and reject duplicates
use unique database constraints for records created from webhook events
check current state before applying an update

Idempotency and deduplication are not the same thing. Deduplication prevents the same event from being processed twice. Idempotency ensures that even if processing happens again, the final result stays correct.

What tools can I use to inspect or replay webhook requests?

Useful tools include:

Postman for manual request inspection
cURL for exact command-line replay
ngrok for exposing local endpoints
Pipedream for inspection and replay
provider dashboards for Stripe webhooks, GitHub webhooks, Slack webhooks, Shopify webhooks, and Twilio webhooks

Use these tools to compare the original request with your local reproduction, especially when debugging signature verification, content-type mismatches, or header stripping by a reverse proxy.

How do I know whether the problem is with my endpoint or the provider?

Use a simple split test:

If the provider says the request was sent but your logs show nothing, check DNS, TLS/SSL certificate, firewall, reverse proxy, and load balancer paths.
If your logs show the request but the provider reports a 4xx or 5xx response, the issue is likely in your endpoint or application code.
If the provider reports a timeout and your logs show a slow handler, the issue is probably your latency or downstream dependency.
If the provider shows retries but your app processed the event once, the issue may be duplicate delivery rather than duplicate business logic.

When in doubt, replay the same request in staging and compare the result with production.

What should be included in a webhook troubleshooting workflow?

A solid workflow should include:

Triage the failure using provider logs, response codes, latency, and retries.
Verify the webhook endpoint path, DNS, TLS/SSL certificate, firewall, reverse proxy, and load balancer.
Inspect headers, content-type, JSON payload, event ID, and correlation ID.
Verify HMAC signatures with the shared secret.
Check application logs, metrics, distributed tracing, queue depth, and dead-letter queue entries.
Reproduce the issue locally or in a staging environment with cURL, Postman, ngrok, or Pipedream.
Confirm the fix in the production environment after the staging test passes.

This workflow works best when it is documented and shared across engineering, support, and operations.

How can I prevent webhook failures from happening again?

Prevention is mostly about reducing fragility:

return a fast 2xx response
verify signatures before processing
make handlers idempotent
use deduplication for repeated event IDs
move slow work to a queue and background worker
monitor logging, metrics, and distributed tracing
alert on latency spikes, retry storms, and dead-letter queue growth
test changes in staging before production

If you operate at scale, add automated checks for reverse proxy config, TLS/SSL certificate expiry, firewall rules, and deployment changes in AWS, Nginx, or Kubernetes.