Intermediate8 min

Error Handling and Reliability

An automation that works in testing and silently fails at 2am is worse than no automation, because you trusted it. Reliability is a feature you build in on purpose: retries for flaky steps, error branches for real failures, and alerts so you find out before your users do.

Retries vs error branches

Retry: for transient problems like a timeout or a rate limit. Set a few attempts with a short delay.
Error branch: for real failures. Catch them and do something useful, like logging and alerting, instead of crashing.

n8n - error workflow

Main workflow ... [ HTTP Request ] (fails)

on error -->

[ Set: error details ] --> [ Slack: alert #ops ]

--> [ Sheet: log failure ]

A dedicated path that runs when any node fails.

Make failures loud

The default failure mode of most tools is to fail quietly and send you an email you will ignore. Replace that with a message to a channel you actually watch. Include what failed, the input that caused it, and a link to the run.

Idempotency stops doubles

If a flow retries after partly succeeding, it can create the same record twice. Where you can, use an external id or an upsert so re-running is safe. Duplicate invoices are how automations lose people's trust.

Result: when something breaks, and it will, you get a clear alert with context, the bad input is logged, and good data keeps flowing.

Retries vs error branches

Make failures loud

Hands-on tasks