Selected Reading

Java Microservices - Retry Pattern

Quiz

Introduction

In distributed systems and microservices, network failures, timeouts, and temporary faults are common. These failures are often temporary and may succeed on subsequent attempts. The Retry Pattern is a resilience technique where a failed request is automatically retried after a brief delay before finally giving up.

This pattern significantly increases the fault tolerance of microservices by allowing them to recover from temporary issues without immediate failure.

Motivation and Problem Statement

Let's consider a real-world example−

A payment microservice calls a third-party payment gateway API. Occasionally, the request fails due to−

Temporary network issues
DNS lookup failure
Gateway throttling

If the service fails outright, it may disrupt customer experience. Instead, if it retries the request a few times, the operation could succeed on the second or third attempt, improving reliability.

Key Challenges

Unpredictable failures in remote services
Overreaction to minor or short-lived glitches
Impact on user experience and system stability

When and Where to Apply

Use the Retry Pattern when −

Failures are transient and recoverable (e.g., timeouts, 5xx errors, temporary unavailability)
The operation is idempotent (i.e., calling it multiple times won't corrupt data or cause unwanted side effects)
The remote system is well-known and typically stable

Avoid retries when −

The failure is permanent (e.g., 404 Not Found, 401 Unauthorized)
The call is non-idempotent (e.g., money transfer or email sending)
Retry may flood an already overloaded system

Core Concepts and Principles

Retry Policy

A retry policy defines how retry attempts are made. Key parameters −

Max retries − How many times to retry (e.g., 3 attempts)
Delay − Time between retries (e.g., 200ms)
Backoff strategy − Fixed, exponential, or randomized
Retry on − Specific exceptions or HTTP statuses

Backoff Strategy

Fixed Delay − Wait a constant time between retries
Exponential Backoff − Delay increases exponentially
Exponential Backoff with Jitter − Adds randomness to avoid retry storms

Design Considerations

When designing a retry mechanism −

Ensure idempotency
Set timeouts on retries to avoid hanging requests
Log each retry attempt
Use circuit breaker in conjunction to avoid retrying during complete outages
Implement fallbacks for graceful degradation

Retry Diagram (described in text)

A retry loop can be illustrated as−

Request → Failure → Retry → Failure → Retry → Give up → Fallback/Error

Implementation Strategies

Strategy 1 − Manual Retry Logic

A developer can wrap method calls in a loop with sleep/delay and exception handling.

int maxAttempts = 3;
int attempt = 0;
while (attempt < maxAttempts) {
   try {
      callExternalService();
      break;
   } catch (Exception e) {
      attempt++;
      Thread.sleep(200); // Delay before retry
   }
}

Strategy 2 − Framework-Based Retry

Use libraries like −

Spring Retry
Resilience4j Retry

These offer declarative retry behavior with advanced configuration.

Example Implementation: Spring Boot + Resilience4j

Dependency

<dependency>
   <groupId>io.github.resilience4j</groupId>
   <artifactId>resilience4j-spring-boot3</artifactId>
   <version>2.0.2</version>
</dependency>

Configuration (application.yml)

resilience4j.retry:
  instances:
    myServiceRetry:
      max-attempts: 3
      wait-duration: 500ms
      retry-exceptions:
        - java.io.IOException

Annotated Method

@Retry(name = "myServiceRetry", fallbackMethod = "fallbackMethod")
public String callExternalService() {
   // Call to external API
}

Fallback Method

public String fallbackMethod(Exception e) {
   return "Service temporarily unavailable";
}

Challenges and Pitfalls

Common Mistakes

Retrying non-idempotent operations
Not limiting max attempts
Retrying instantly without backoff
Not using timeouts − can lead to thread exhaustion
Cascading retries across services causing overload

Best Practices

Always limit the number of retries
Retry only on transient and known recoverable failures
Log retry attempts and metrics for observability
Prefer framework-level retries over custom code when possible

Tools and Libraries

Sr.No.	Tool	Purpose
1	Spring Retry	Declarative retry support in Spring Boot
2	Resilience4j Retry	Lightweight, modern retry + resilience
3	Polly (.NET)	Retry handling in .NET applications
4	Retry4j	Fluent, configurable retry logic in Java
5	Backoff (Python)	Retry utilities with exponential backoff

Previous Quiz Next