Test Quarantine
A flaky test — one that passes sometimes and fails sometimes against the same code — is one of the few ways CI can legitimately lie to you. Without a quarantine, the team's options are:
Block PRs until the flake is fixed (kills velocity).
Ignore the failure (trains everyone to ignore CI; real regressions slip through).
Delete the test (loses real coverage).
Option (4) is the quarantine — keep running the test (so we know if it stabilises) but exempt it from blocking the build. The team gets honest signal on the rest of the suite, and the flake stays visible in the PR summary as nagging proof that the work isn't done.
The mechanism lives at three files:
Policy:
.test-quarantine.yaml(repo root)Processor:
scripts/process-test-results.mjsCI gate: the
Process test results against quarantinestep in.github/workflows/ci.yaml'stestjob
When to quarantine
Symptoms of a flake (quarantine candidate):
The test passes on rerun without any code change.
It fails differently on different runs (different assertion, different timeout, different stack).
It involves timing, parallelism, network, randomness, or a shared filesystem.
Symptoms of a regression (do NOT quarantine — fix or revert):
Same assertion fails consistently.
The test was passing on
mainand fails the moment a specific PR branches off.The failure narrative matches a recent code change in the same area.
If unsure, run the test 10× locally:
Mixed results → flake. Consistent fail → regression.
How to add an entry
Find the test's
fullName— vitest formats it as<describe path> > <it title>. Thescripts/flake-tracker.shreport uses the same format, so you can copy-paste the name directly from a nightly flake-tracker output.Edit
.test-quarantine.yaml:Commit with a descriptive message:
Push. The CI summary on the PR will now show the test failure under "Quarantined Failures" if it fails again. The build doesn't block.
File the underlying issue. A quarantine entry without an open issue is just hiding a bug. The runbook 30-day rule (below) treats any entry without an issue link in
reasonas a code smell.
The 30-day stale-entry rule
Every entry's since field is checked on each CI run. Entries older than 30 days emit ::warning:: annotations to the workflow run:
The stale-entry warnings also surface in the PR sticky comment under a "Stale quarantine entries" subtable.
The expectation: 30 days is the soft deadline. Within that window, either:
Fix the underlying flake (preferred). Remove the entry.
Move the test under
vitest.config.ts'sexcludeif the flake is inherent to the test design and a rewrite is out of scope. Add a follow-up issue tracking the rewrite.Escalate to a team-wide call if the test is fundamentally racing against infra outside our control.
A stale-entry warning that lingers more than 60 days is grounds for auto-escalation — open an issue on the entry's owner and tag the dev infra lead.
How to remove an entry
Verify the underlying flake is fixed:
Watch the nightly flake-tracker workflow run for a week. The test should NOT appear in its "tests with mixed pass/fail status" list.
Delete the entry from
.test-quarantine.yaml. Commit with a message linking the original issue:On the next CI run, the Test job loses the quarantine cover for that test. If it fails again, you've just regressed — revert the un-quarantine commit and reopen the issue.
Relationship to flake-tracker.sh
flake-tracker.shRole
Detect
Suppress
Run
Nightly (cron via .github/workflows/flake-tracker.yaml)
Every CI run
Action
Reports tests with mixed pass/fail across last 10 runs
Splits failures into real vs quarantined; gates build
Workflow
Surfaces candidates
Accepts them after human review
The flake-tracker finds candidates; the quarantine accepts the human decision. The two are complementary — neither alone is enough.
Current quarantine state
As of 2026-05-12, the allowlist is empty. The mechanism ships ready-to-use; the first real quarantine entry will be filed as flakes are observed in CI.
Why not just retry the test?
vitest supports test.retry(3). We considered this and rejected it for the standard SLO-style reasons:
A retried test that passes on attempt 3 still ran 3× the work, burning CI minutes and confusing flake-tracker statistics.
Retries hide the flake. The quarantine surfaces it (PR summary + stale-entry warnings).
Retries are a per-test author decision; the quarantine is a team-level policy with explicit owner + deadline.
Either mechanism works in isolation; quarantining is the more visible one and was chosen for that reason.
Related
scripts/flake-tracker.sh— flake detector script.github/workflows/flake-tracker.yaml— daily cronPhase 6 completion report — test strategy context
License Policy runbook — same hand-parsed YAML pattern
Last updated