Implementing Circuit Breaker for Microservices Resilience
Circuit Breaker (automatic switch) protects a microservice from cascading failures: if a dependent service is overloaded or unavailable, instead of endless retries and request queue buildup—fail fast with a pre-prepared fallback.
Three States
Closed (normal) — requests pass through. Error counter grows on failures.
Open (tripped) — when error threshold is exceeded (e.g., 5 out of 10 in 30 seconds), the Circuit Breaker "opens". All requests are immediately rejected without calling the service.
Half-Open (testing) — after a timeout (e.g., 30 seconds), one test request is allowed. If successful—transition to Closed. If failed—back to Open.
Implementation via Opossum (Node.js)
import CircuitBreaker from 'opossum';
const paymentServiceOptions = {
timeout: 3000, // 3 sec — request is considered hung
errorThresholdPercentage: 50, // 50% errors → Open
resetTimeout: 30000, // after 30 sec → Half-Open
volumeThreshold: 10, // minimum 10 requests for evaluation
};
const breaker = new CircuitBreaker(callPaymentService, paymentServiceOptions);
// Fallback when circuit is open
breaker.fallback(() => ({
status: 'payment_deferred',
message: 'Payment will be processed later'
}));
// Monitoring
breaker.on('open', () => logger.warn('Payment service circuit OPEN'));
breaker.on('halfOpen', () => logger.info('Payment service circuit HALF-OPEN'));
breaker.on('close', () => logger.info('Payment service circuit CLOSED'));
// Usage
async function processPayment(orderId: string, amount: number) {
return breaker.fire(orderId, amount);
}
Resilience4j (Java/Spring Boot)
@Service
public class OrderService {
@CircuitBreaker(name = "paymentService", fallbackMethod = "paymentFallback")
@Retry(name = "paymentService")
@TimeLimiter(name = "paymentService")
public CompletableFuture<PaymentResult> processPayment(Order order) {
return CompletableFuture.supplyAsync(() ->
paymentClient.charge(order.getId(), order.getTotal())
);
}
private CompletableFuture<PaymentResult> paymentFallback(Order order, Exception ex) {
log.warn("Payment service unavailable for order {}", order.getId());
return CompletableFuture.completedFuture(
PaymentResult.deferred(order.getId())
);
}
}
# application.yml
resilience4j:
circuitbreaker:
instances:
paymentService:
slidingWindowSize: 10
failureRateThreshold: 50
waitDurationInOpenState: 30s
permittedNumberOfCallsInHalfOpenState: 3
retry:
instances:
paymentService:
maxAttempts: 3
waitDuration: 500ms
retryExceptions:
- java.net.ConnectException
- java.util.concurrent.TimeoutException
Polly (.NET)
var circuitBreakerPolicy = Policy
.Handle<HttpRequestException>()
.OrResult<HttpResponseMessage>(r => !r.IsSuccessStatusCode)
.CircuitBreakerAsync(
handledEventsAllowedBeforeBreaking: 5,
durationOfBreak: TimeSpan.FromSeconds(30),
onBreak: (result, duration) =>
logger.Warning("Circuit broken for {Duration}", duration),
onReset: () => logger.Information("Circuit reset")
);
var retryPolicy = Policy
.Handle<HttpRequestException>()
.WaitAndRetryAsync(3, attempt => TimeSpan.FromMilliseconds(200 * attempt));
var policy = Policy.WrapAsync(retryPolicy, circuitBreakerPolicy);
var result = await policy.ExecuteAsync(() =>
httpClient.GetAsync($"{paymentServiceUrl}/charge")
);
Circuit Breaker Metrics
State should be exported to Prometheus:
const openCircuits = new Gauge({
name: 'circuit_breaker_open_total',
help: 'Number of open circuit breakers',
labelNames: ['service']
});
breaker.on('open', () => openCircuits.inc({ service: 'payment' }));
breaker.on('close', () => openCircuits.dec({ service: 'payment' }));
Implementation Timeline
- Circuit Breaker for one service + fallback + metrics — 2–3 days
- Full coverage of all external calls in service + dashboard — 1 week







