Supply Chain Hardening

Every container image Lithosphere publishes to GHCR carries three verifiable artifacts produced by .github/workflows/publish-images.yaml:

  1. Cosign signature — keyless, identity-bound to the GitHub Actions workflow that built the image. Proves the image was produced by this repository, not a typo-squatter.

  2. SLSA build-provenance attestation — in-toto SLSA Provenance v1.0 statement linking the image digest to the specific workflow run, source commit, and build invocation. Proves how the image was produced.

  3. SBOM (SPDX) — Software Bill of Materials enumerating every package in the image. Required for CVE response and license-policy audits.

Together they satisfy SLSA Build Level 2 (signed provenance from a hosted build platform). Level 3 (non-falsifiable provenance) would require a hardened isolation layer beyond GitHub-hosted runners — out of scope for testnet posture.

Verifying a published image

Pick any tag on any image at ghcr.io/kajlabs/lithosphere-*. Three checks, in order of increasing strength:

1. Cosign signature

cosign verify ghcr.io/kajlabs/lithosphere-api:sha-<short> \
  --certificate-identity-regexp 'https://github.com/KaJLabs/Lithosphere/.+' \
  --certificate-oidc-issuer https://token.actions.githubusercontent.com

Success output includes the Fulcio cert subject (workflow ref + actor + SHA) and the Rekor transparency log entry. A mismatch means either the image wasn't built by this repo, or the workflow was tampered with.

2. SLSA build provenance

gh attestation verify oci://ghcr.io/kajlabs/lithosphere-api:sha-<short> \
  --owner KaJLabs

This validates the in-toto attestation pushed to the registry alongside the image. The output includes the build parameters, source repo + commit, and runner platform. Stricter than the Cosign check because the attestation includes a structured description of how the build happened, not just "this org signed it."

3. SBOM

SBOMs are uploaded as workflow artifacts (90-day retention). For CVE response: jq '.packages[] | select(.name == "<dep>")' finds whether a known-vulnerable package is in your image.

What each verification rules out

Threat
Cosign sig
SLSA provenance
SBOM

Typo-squatted image (lithosphere-apl instead of -api)

Image rebuilt offline + force-pushed to GHCR with a stolen token

Workflow modified to inject malicious code into the build

Same Dockerfile, different source commit

Same source commit, but pulled-in dep CVEd post-build

The three are complementary; a serious compliance review checks all three.

How the gates layer

publish-images.yaml pipeline order:

Trivy runs before signing so a CVE-laden image never gets an attestation in the first place. The CRITICAL gate is hard-fail; HIGH findings upload to the GitHub Security tab for triage (see license policy for the parallel dependency-side gate).

Deployment-side verification

The Verify GHCR Image Signatures step in deploy-simple.yaml runs Cosign + gh attestation verify against each published image for the deploy's commit SHA. Today the bastion still builds from source, so the check is advisory (continue-on-error: true) — a failed or skipped row doesn't block the deploy. Results land in the deployment summary's ## Supply Chain Verification table next to the ## Build SHA Verification table.

Outcomes the step reports:

Status
Meaning

✅ verified

Cosign keyless signature AND SLSA Build L2 attestation both match https://github.com/KaJLabs/Lithosphere/.+ identity.

⚠️ partial (sig-only)

Image is signed but lacks (or has an invalid) SLSA attestation. Treat as suspicious.

⚠️ partial (att-only)

Attestation present but the Cosign signature didn't verify. Usually means a stale registry mirror or a key-rotation in progress.

❌ failed

Neither artifact verified. Investigate before promoting.

— (skipped)

No image for this SHA. Normal when the commit only touched workflows / docs / chain — publish-images.yaml runs only on Makalu/{api,indexer,explorer}/** path changes.

Promotion path to a blocking gate

The day deployments switch from build-from-source to pull-published-image:

  1. Remove continue-on-error: true from Verify GHCR Image Signatures.

  2. Change the bastion deploy script to docker compose pull then docker compose up -d --no-build.

  3. The verification step now blocks any deploy whose images don't carry matching Cosign + SLSA artifacts. Tampered, typo-squatted, or offline-rebuilt images can't reach production.

Until then, the advisory mode is the right posture: it surfaces the signal in every deploy summary without inheriting publish-image's schedule constraints.

Rotation & key management

Cosign keyless avoids holding a long-lived signing key — every signature is bound to a short-lived Fulcio certificate issued during the workflow run. The trust anchor is the GitHub Actions OIDC issuer + the KaJLabs/Lithosphere repo identity. No key to rotate, no key to leak.

For npm package publishes (SDK), the equivalent identity binding is npm provenance attestations — wired in release.yaml via --provenance. See release-process.md.

Code-level static analysis

Image-layer attestations and dependency-license enforcement don't catch bugs in the source we own — SQL injection, SSRF, path traversal, weak crypto, prototype pollution. That's what codeql.yaml covers: GitHub's CodeQL runs on every push to main, every PR, and weekly via cron. The JS/TS extractor (build-mode none, query suite security-and-quality) indexes Makalu/{api,indexer,explorer,packages,templates,contracts/scripts,tooling} plus repo-level scripts/. Findings post to the Security tab under the codeql-javascript-typescript category, alongside Trivy's trivy-{api,indexer,explorer} entries.

The three layers compose:

Layer
Catches
Workflow

Source SAST

bugs in code we wrote

codeql.yaml

Container scan

OS/library CVEs

publish-images.yaml (Trivy)

Supply chain

image tampering / typo-squat

publish-images.yaml (Cosign + SLSA + SBOM)

Triage workflow for CodeQL findings

CodeQL's first-scan baseline often contains a long tail of style-level notes plus a handful of legitimate-but-false-positive flow alerts (e.g. router.push(\/blocks/${userInput}`)flagged as DOM-XSS because the extractor can't prove the destination route doesn'tinnerHTML` the segment). The expectation is not "zero open alerts" — it's "every open alert has been triaged":

  1. Fix at source when the alert points at a genuine issue. Recent examples (commit landing this section): js/log-injection from console.warn with raw user input → sanitizeForLog() helper that strips ASCII control chars; js/file-system-race from an existsSyncappendFileSync pair → drop the pre-check (the append creates on demand); js/polynomial-redos on /=+$/ → manual trailing-char strip with no regex.

  2. Dismiss with a comment when the alert is a false positive. Use the GitHub Security UI ("Dismiss alert" → "False positive" / "Used in tests" / "Won't fix") and include the reason. Don't leave open alerts indefinitely without dismissal — they create noise that masks real findings.

  3. Track as work when the alert is real but the fix needs design (e.g. SSRF in a controlled-base proxy endpoint — needs an explicit URL allow-list). Create an issue, link the alert, leave the alert open until the issue closes.

Cadence: triage every Monday alongside the weekly CodeQL cron run.

Last updated