Fail-closed by design
The client’s fail-closed guarantee is a property you build around, not against.
This page is the operational checklist.
Do
A denied action during a PDP outage is correct behavior, not a bug to patch in the client. Buy availability
where it belongs:
- prefer
localmode when the server can live in the same app (a function call can’t time out across a
network); - run the remote server highly available (multiple nodes, health checks);
- set a sensible cache TTL so transient blips are absorbed by recent decisions.
Use Iam::can() / ->granted() for the yes/no decision. Where a sensitive action is worth a re-auth prompt,
read requiresStepUp via check() and drive the step-up flow — don’t hard-fail the user
when you could let them elevate.
http.timeout (default 5s) is the cap on how long a decision can block a request. Set it from your latency
budget: long enough to avoid spurious denies under normal jitter, short enough that a hung PDP doesn’t hang
your pages.
Don’t
There is no fail_open key, and you should not synthesize one (e.g. catching a deny and allowing anyway). An
unavailable PDP that allows is an unbounded grant — the exact failure mode the design exists to prevent.
allowed === true can still mean “only after step-up”. Branching on it skips assurance. Gate on granted().
A long TTL hides revocations for its duration on each node. For actions that demand immediate revocation,
lower the TTL or check with explain => true (which bypasses the cache).
Degrading gracefully — the right way
If a feature genuinely needs to keep working during a PDP outage, make that an explicit, scoped application
decision — never a transport default:
$decision = Iam::check($user, 'reports:view');
if (! $decision->granted()) {
// The PDP denied (possibly because it's unreachable). Decide, per feature,
// what "denied right now" should mean for the user experience:
return response()->view('reports.unavailable', status: 503);
}
You can layer your own cautious fallback for a specific low-risk, read-only action (e.g. show stale,
clearly-labelled data) — but you do it consciously, at one call site, with the risk visible. You never flip a
global switch that turns every deny into an allow.
A pre-flight checklist
- Mode chosen on purpose —
localwhere you can,httpwhere you must, documented either way. - Timeout set from a latency budget — not left implicit.
- Cache TTL set from a revocation budget — the max staleness you accept.
- Shared cache store for multi-node — so the fleet caches consistently.
- Step-up handled where it matters — sensitive actions offer elevation, not a dead end.
- No home-grown fail-open — verified in review.
See also
- Fail-closed authorization — the formal argument.
- Cache decisions
- Deployment topologies