Circuit Breaker Pattern

On This Page

1	The Problem It Solves	2	Pattern Structure
3	When to Use	4	When Not to Use
5	Trade-offs	6	Implementation Approach
7	Anti-Patterns to Avoid	8	Cloud-Specific Implementations
9	References

The Problem It Solves

Without a circuit breaker, a slow or unavailable downstream service causes the calling service to exhaust its thread pool waiting for responses. Each waiting thread holds a connection and memory. New requests queue up behind the waiting threads. The calling service eventually runs out of resources and fails too — a cascading failure that takes down healthy services along with the unhealthy one.

Pattern Structure

%%{init:{'theme':'base','themeVariables':{'fontSize':'14px','fontFamily':'Inter, system-ui, sans-serif','primaryColor':'#DBEAFE','primaryTextColor':'#1e3a5f','primaryBorderColor':'#2563EB','lineColor':'#374151','clusterBkg':'#F9FAFB','clusterBorder':'#D1D5DB','edgeLabelBackground':'#FFFFFF'},'flowchart':{'curve':'orthogonal','padding':30,'nodeSpacing':65,'rankSpacing':75,'useMaxWidth':true}}}%%
flowchart TD
    START([Service Makes Remote Call])

    START --> STATE{Circuit State}

    STATE -->|Closed — normal operation| CALL[Attempt remote call]
    CALL --> OUTCOME{Call Outcome}
    OUTCOME -->|Success| RESET[Reset failure counter\nReturn response]
    OUTCOME -->|Failure or timeout| COUNT[Increment failure counter]
    COUNT --> THRESHOLD{Failure threshold\nexceeded?}
    THRESHOLD -->|No| CALL
    THRESHOLD -->|Yes| OPEN[Open circuit\nStart recovery timer]

    STATE -->|Open — failing fast| FAST_FAIL[Fail immediately\nNo remote call made\nReturn fallback or error]
    FAST_FAIL --> TIMER{Recovery timer\nexpired?}
    TIMER -->|No| FAST_FAIL
    TIMER -->|Yes| HALF[Half-open state\nAllow probe request through]

    HALF --> PROBE[Attempt probe call]
    PROBE --> PROBE_RESULT{Probe\nSucceeded?}
    PROBE_RESULT -->|Yes| CLOSED([Close circuit\nResume normal operation])
    PROBE_RESULT -->|No| OPEN

    style START fill:#4f8ef7,color:#fff
    style CLOSED fill:#10b981,color:#fff
    style OPEN fill:#fef3c7
    style FAST_FAIL fill:#fef3c7
    style HALF fill:#e0f2fe

When to Use

Any service that makes synchronous remote calls to downstream dependencies
Systems where a downstream dependency failing should not cause the caller to fail
High-traffic services where thread pool exhaustion from slow downstream calls is a realistic risk
Microservices architectures where cascading failures across services are a known operational concern

When Not to Use

Asynchronous messaging patterns where the caller does not wait for a response
Internal in-process calls that do not cross a network boundary
Simple two-tier applications where there is only one dependency and failure is acceptable

Trade-offs

Benefit	Cost
Prevents cascading failures — failing fast protects the caller	Fallback behaviour must be designed and tested
Gives the downstream service time to recover	Adds latency measurement overhead per call
Enables graceful degradation — serve partial results	State management for the circuit requires storage or in-process counters
Provides operational visibility into dependency health	Half-open probe logic must be tuned per dependency

Implementation Approach

Define thresholds appropriate to the dependency. A payment service tolerates fewer failures before opening than a recommendation service. Common starting points: open after five consecutive failures or 50% failure rate over a ten-second window.

Implement meaningful fallbacks. When the circuit is open, return a cached result, a default value, or a clear error that the upstream caller can handle. A cached product catalogue from five minutes ago is better than an exception that propagates to the user.

Expose circuit state as a metric. The circuit state — closed, open, half-open — and the failure rate per dependency are essential operational metrics. Alert when any circuit opens in production. A circuit opening is a signal that a dependency is failing.

Set appropriate timeouts on the calls the circuit wraps. A circuit breaker without a timeout is incomplete. If the call never times out, the circuit never opens. Set a timeout shorter than the caller's own timeout so failures are detected before the caller times out itself.

Anti-Patterns to Avoid

⚠ 1. Circuit Breaker Without a Fallback

Opening the circuit and returning an unhandled exception that propagates to the user as a 500 error. The cascade is stopped at the service boundary but the user experience is no better than if there were no circuit breaker.

Hover to see the fix ↻

↺ Correct Approach

Design a fallback response for every circuit that can open. The fallback may be degraded — an empty list, a cached result, a user-visible message — but it is a deliberate choice, not an unhandled exception.

⚠ 2. Shared Circuit State Across Instances

Each instance of a horizontally scaled service maintains its own in-process circuit state. Instance A opens its circuit while Instance B sees different traffic and stays closed. The circuit state is inconsistent across the fleet.

Hover to see the fix ↻

↺ Correct Approach

For stateless horizontally-scaled services, use a distributed circuit breaker backed by a shared cache (Redis) or accept that each instance manages its own state independently and use percentage-based thresholds rather than absolute counts.

Cloud-Specific Implementations

AWS: Lambda and API Gateway have built-in timeout and retry configuration. For circuit breaker state shared across instances, use ElastiCache Redis. Resilience4j implements circuit breakers for Java-based Lambda functions.

Flowchart

%%{init:{'theme':'base','themeVariables':{'fontSize':'14px','fontFamily':'Inter, system-ui, sans-serif','primaryColor':'#DBEAFE','primaryTextColor':'#1e3a5f','primaryBorderColor':'#2563EB','lineColor':'#374151','clusterBkg':'#F9FAFB','clusterBorder':'#D1D5DB','edgeLabelBackground':'#FFFFFF'},'flowchart':{'curve':'orthogonal','padding':30,'nodeSpacing':65,'rankSpacing':75,'useMaxWidth':true}}}%% flowchart TD START([Remote Call Attempted]) START --> CB_STATE{Circuit State} CB_STATE -->|Closed| ATTEMPT[Make remote call\nStart timeout timer] CB_STATE -->|Open| FAST[Fail immediately\nReturn fallback response\nNo network call made] ATTEMPT --> CB_RESULT{Result within\ntimeout?} CB_RESULT -->|Success| SUCCESS_CB[Return response\nReset failure counter] CB_RESULT -->|Failure or timeout| FAIL_CB[Record failure\nCheck threshold] FAIL_CB --> THRESH_CB{Threshold\nexceeded?} THRESH_CB -->|No| ATTEMPT THRESH_CB -->|Yes| OPEN_CB[Open circuit\nLog alert to observability\nStart recovery timer] FAST --> RECOVER{Recovery\ntimer expired?} RECOVER -->|No| FAST RECOVER -->|Yes| HALF_CB[Half-open\nAllow one probe request] HALF_CB --> PROBE_CB{Probe\nsucceeded?} PROBE_CB -->|Yes| CLOSE_CB([Close circuit\nResume normal operation]) PROBE_CB -->|No| OPEN_CB style START fill:#4f8ef7,color:#fff style CLOSE_CB fill:#10b981,color:#fff style OPEN_CB fill:#fef3c7 style FAST fill:#fef3c7 style HALF_CB fill:#e0f2fe

References

Nygard, Michael T. — Release It! Design and Deploy Production-Ready Software. Pragmatic Bookshelf, 2018.
Fowler, Martin — Circuit Breaker. martinfowler.com/bliki/CircuitBreaker
Resilience4j — Circuit breaker for Java. resilience4j.readme.io
Netflix — Hystrix: Latency and Fault Tolerance. github.com/Netflix/Hystrix

Ascendion Engineering Knowledge Base ← Structural Patterns