Skip to main content

Retry

What retry actually means, when repeating helps, and when repeating only multiplies the damage.

Andrews Ribeiro

Andrews Ribeiro

Founder & Engineer

What it is

Retry means trying a failed operation again.

That makes sense when the failure looks temporary:

  • short timeout
  • network error
  • service briefly unavailable

The point is not “keep trying until it works.”

The point is to give a second chance to a failure that may disappear on its own.

When it matters

Retry shows up all the time in:

  • service-to-service calls
  • asynchronous jobs
  • webhooks
  • queues

In production, short-lived failure is not a rare exception.

It is part of normal life.

Common mistake

The classic mistake is treating retry like a magic button.

Without judgment, it becomes:

  • request storm
  • duplicated side effects
  • queue growth without control

Retry without idempotency and without limits usually makes the problem worse.

Short example

A worker calls an external service to generate an invoice.

The first attempt times out.

Instead of marking the job as permanently failed immediately, the worker waits a little and tries again.

If the second attempt works, you absorbed a transient failure without manual intervention.

If it still fails after a few tries, then another path takes over:

  • permanent failure
  • DLQ
  • manual inspection

Why it helps

Retry makes the system less fragile around short-lived failures.

But it only helps when it comes with:

  • retry limits
  • backoff
  • idempotency
  • a clear stop condition

Good retry is not automated stubbornness. It is controlled tolerance for temporary failure.

You finished this article

Keep exploring

Related articles