Intermediate8 min
Error Handling and Reliability
An automation that works in testing and silently fails at 2am is worse than no automation, because you trusted it. Reliability is a feature you build in on purpose: retries for flaky steps, error branches for real failures, and alerts so you find out before your users do.
Retries vs error branches
- Retry: for transient problems like a timeout or a rate limit. Set a few attempts with a short delay.
- Error branch: for real failures. Catch them and do something useful, like logging and alerting, instead of crashing.
n8n - error workflow
Main workflow ... [ HTTP Request ] (fails)
|
on error -->
[ Set: error details ] --> [ Slack: alert #ops ]
--> [ Sheet: log failure ]
Make failures loud
The default failure mode of most tools is to fail quietly and send you an email you will ignore. Replace that with a message to a channel you actually watch. Include what failed, the input that caused it, and a link to the run.
Idempotency stops doubles
If a flow retries after partly succeeding, it can create the same record twice. Where you can, use an external id or an upsert so re-running is safe. Duplicate invoices are how automations lose people's trust.
Result: when something breaks, and it will, you get a clear alert with context, the bad input is logged, and good data keeps flowing.