- Java Microservices Tutorial
- Java Microservices - Home
- Microservices - Introduction
- Microservices vs Monolith vs SOA
- Java Microservices - Environment Setup
- Java Microservices - Advantages of Spring Boot
- Java Microservices - Design Patterns
- Java Microservices - Domain Driven Design
- Java Microservices - Decomposition by Business Capability
- Java Microservices - Decomposition by Subdomain
- Java Microservices - Backend for Frontend
- Java Microservices - The Strangler Pattern
- Java Microservices - Synchronous Communication
- Java Microservices - Asynchronous Communication
- Java Microservices - Saga Pattern
- Java Microservices - Centralized Logging (ELK Stack)
- Java Microservices - Event Sourcing
- Java Microservices - CQRS Pattern
- Java Microservices - Sidecar Pattern
- Java Microservices - Service Mesh Pattern
- Java Microservices - Circuit Breaker Pattern
- Java Microservices - Distributed Tracing
- Java Microservices - Control Loop Pattern
- Java Microservices - Database Per Service
- Java Microservices - Bulkhead Pattern
- Java Microservices - Health Check API
- Java Microservices - Retry Pattern
- Java Microservices - Fallback Pattern
- Java Microservices Useful Resources
- Java Microservices Quick Guide
- Java Microservices Useful Resources
- Java Microservices Discussion
Java Microservices - Retry Pattern
Introduction
In distributed systems and microservices, network failures, timeouts, and temporary faults are common. These failures are often temporary and may succeed on subsequent attempts. The Retry Pattern is a resilience technique where a failed request is automatically retried after a brief delay before finally giving up.
This pattern significantly increases the fault tolerance of microservices by allowing them to recover from temporary issues without immediate failure.
Motivation and Problem Statement
Let's consider a real-world example−
A payment microservice calls a third-party payment gateway API. Occasionally, the request fails due to−
Temporary network issues
DNS lookup failure
Gateway throttling
If the service fails outright, it may disrupt customer experience. Instead, if it retries the request a few times, the operation could succeed on the second or third attempt, improving reliability.
Key Challenges
Unpredictable failures in remote services
Overreaction to minor or short-lived glitches
Impact on user experience and system stability
When and Where to Apply
Use the Retry Pattern when −
Failures are transient and recoverable (e.g., timeouts, 5xx errors, temporary unavailability)
The operation is idempotent (i.e., calling it multiple times won't corrupt data or cause unwanted side effects)
The remote system is well-known and typically stable
Avoid retries when −
The failure is permanent (e.g., 404 Not Found, 401 Unauthorized)
The call is non-idempotent (e.g., money transfer or email sending)
Retry may flood an already overloaded system
Core Concepts and Principles
Retry Policy
A retry policy defines how retry attempts are made. Key parameters −
Max retries − How many times to retry (e.g., 3 attempts)
Delay − Time between retries (e.g., 200ms)
Backoff strategy − Fixed, exponential, or randomized
Retry on − Specific exceptions or HTTP statuses
Backoff Strategy
Fixed Delay − Wait a constant time between retries
Exponential Backoff − Delay increases exponentially
Exponential Backoff with Jitter − Adds randomness to avoid retry storms
Design Considerations
When designing a retry mechanism −
Ensure idempotency
Set timeouts on retries to avoid hanging requests
Log each retry attempt
Use circuit breaker in conjunction to avoid retrying during complete outages
Implement fallbacks for graceful degradation
Retry Diagram (described in text)
A retry loop can be illustrated as−
Request → Failure → Retry → Failure → Retry → Give up → Fallback/Error
Implementation Strategies
Strategy 1 − Manual Retry Logic
A developer can wrap method calls in a loop with sleep/delay and exception handling.
int maxAttempts = 3;
int attempt = 0;
while (attempt < maxAttempts) {
try {
callExternalService();
break;
} catch (Exception e) {
attempt++;
Thread.sleep(200); // Delay before retry
}
}
Strategy 2 − Framework-Based Retry
Use libraries like −
Spring Retry
Resilience4j Retry
These offer declarative retry behavior with advanced configuration.
Example Implementation: Spring Boot + Resilience4j
Dependency
<dependency> <groupId>io.github.resilience4j</groupId> <artifactId>resilience4j-spring-boot3</artifactId> <version>2.0.2</version> </dependency>
Configuration (application.yml)
resilience4j.retry:
instances:
myServiceRetry:
max-attempts: 3
wait-duration: 500ms
retry-exceptions:
- java.io.IOException
Annotated Method
@Retry(name = "myServiceRetry", fallbackMethod = "fallbackMethod")
public String callExternalService() {
// Call to external API
}
Fallback Method
public String fallbackMethod(Exception e) {
return "Service temporarily unavailable";
}
Challenges and Pitfalls
Common Mistakes
Retrying non-idempotent operations
Not limiting max attempts
Retrying instantly without backoff
Not using timeouts − can lead to thread exhaustion
Cascading retries across services causing overload
Best Practices
Always limit the number of retries
Retry only on transient and known recoverable failures
Log retry attempts and metrics for observability
Prefer framework-level retries over custom code when possible
Tools and Libraries
| Sr.No. | Tool | Purpose |
|---|---|---|
| 1 | Spring Retry | Declarative retry support in Spring Boot |
| 2 | Resilience4j Retry | Lightweight, modern retry + resilience |
| 3 | Polly (.NET) | Retry handling in .NET applications |
| 4 | Retry4j | Fluent, configurable retry logic in Java |
| 5 | Backoff (Python) | Retry utilities with exponential backoff |