Skip to content

feat: degrade gracefully in offline / enterprise environments#337

Merged
gtsiolis merged 8 commits into
mainfrom
devx-703-offline-graceful-degradation
Jun 29, 2026
Merged

feat: degrade gracefully in offline / enterprise environments#337
gtsiolis merged 8 commits into
mainfrom
devx-703-offline-graceful-degradation

Conversation

@gtsiolis

@gtsiolis gtsiolis commented Jun 23, 2026

Copy link
Copy Markdown
Member

Stacked on #325. This PR's base is the image-override branch, so its diff shows offline changes only; it auto-retargets to main when #325 merges. The local-image fallback integration test relies on the image= field introduced in #325.

What

Makes lstk start degrade gracefully in offline / enterprise environments that cannot reach Docker Hub or the license server (offline/air-gapped, corporate proxy, or TLS interception) — without an explicit flag.

  • Image pull: if PullImage fails but the image is already present locally (new runtime.ImageExists), lstk warns "using the local image" and starts the local image instead of failing.
  • License pre-flight: validateLicense now distinguishes a definitive server rejection (*api.LicenseError, e.g. HTTP 403/400 — still fatal) from a transport-level failure (any other error — offline/proxy/cert). On a transport failure it skips the pre-flight check and lets the container validate its own bundled license at startup.
  • runtime.PullImage always closes its progress channel, even when ImagePull fails early, so the local-image fallback path does not leak the progress goroutine.
  • Context cancellation is propagated during the license pre-flight so Ctrl+C aborts cleanly.

Scope

Second of the two PRs split out of DEVX-703. The custom image override half ships separately in #325. This PR is offline graceful-degradation only.

Note: this addresses the truly offline case (network requests fail). The separate slow-network / demo case raised in #team-cli (a working-but-slow link where the pull succeeds but takes 20 min) is a different problem — graceful degradation never triggers there because nothing fails. That is being tracked as its own piece of work (--offline to skip the pull when the image is local + a cancellable pull).

Tests

  • Unit (internal/container): TestPullImages_FallsBackToLocalImageWhenPullFails, TestPullImages_FailsWhenPullFailsAndImageMissing, TestValidateLicense_ContinuesWhenServerUnreachable, TestValidateLicense_FailsOnServerRejection.
  • Integration: local-image fallback and license-server-unreachable E2E coverage (added on this branch).

Refs DEVX-703

@gtsiolis gtsiolis self-assigned this Jun 23, 2026
@gtsiolis gtsiolis added semver: patch docs: needed Pull request requires documentation updates labels Jun 23, 2026
@gtsiolis gtsiolis force-pushed the devx-703-offline-graceful-degradation branch from 5428042 to 79b63e1 Compare June 23, 2026 19:28
@gtsiolis gtsiolis changed the base branch from main to devx-703-support-enterprise-environments-that-cannot-pull-from-ac7a June 23, 2026 19:28
@gtsiolis gtsiolis force-pushed the devx-703-support-enterprise-environments-that-cannot-pull-from-ac7a branch 2 times, most recently from fd896c8 to 696f298 Compare June 25, 2026 10:52
@gtsiolis gtsiolis force-pushed the devx-703-offline-graceful-degradation branch 2 times, most recently from 658808b to 70643a3 Compare June 25, 2026 11:07
@gtsiolis gtsiolis marked this pull request as ready for review June 25, 2026 11:07
@gtsiolis gtsiolis requested a review from a team as a code owner June 25, 2026 11:07
@gtsiolis

gtsiolis commented Jun 25, 2026

Copy link
Copy Markdown
Member Author

Note to any reviewers, this is a stacked PR against #325. Cc @anisaoshafi

@gtsiolis gtsiolis force-pushed the devx-703-support-enterprise-environments-that-cannot-pull-from-ac7a branch from 696f298 to d059449 Compare June 26, 2026 11:22
@gtsiolis gtsiolis force-pushed the devx-703-offline-graceful-degradation branch from 70643a3 to 61c278d Compare June 26, 2026 11:22
@gtsiolis gtsiolis force-pushed the devx-703-offline-graceful-degradation branch from 9c427be to 8d0cfb3 Compare June 29, 2026 09:52

@anisaoshafi anisaoshafi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, in my test, it works well 🙌🏼

the web app link is irrelevant for offline users, but super small nit:
Image

@gtsiolis

Copy link
Copy Markdown
Member Author

Thanks for taking a closer look, @anisaoshafi! 🙏

@carole-lavillonniere carole-lavillonniere left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a commit f51aeaf to:

  • fix one comment
  • document the feature in readme
  • fix 2 integration tests that were using the actual home directory (would break parallelism and determinism of tests)

@gtsiolis please make sure this looks fine to you

Base automatically changed from devx-703-support-enterprise-environments-that-cannot-pull-from-ac7a to main June 29, 2026 13:07
gtsiolis and others added 8 commits June 29, 2026 16:09
Enterprise environments that cannot reach Docker Hub or the license
server (offline/air-gapped, proxy or TLS interception) hit two hard
failures: image pulls and license validation. Rather than gate this
behind an explicit flag, container.Start now degrades gracefully when an
internet request fails:

- Image pull: if PullImage fails but the image is already available
  locally (via the new runtime.ImageExists), lstk warns and uses the
  local image instead of failing.
- License pre-flight: validateLicense distinguishes a definitive server
  rejection (*api.LicenseError, still fatal) from a transport-level
  failure (offline/proxy/cert), skipping the check on transport failure
  and letting the container validate its own bundled license.

runtime.PullImage always closes its progress channel, even when
ImagePull fails early, so the local-image fallback does not leak the
progress goroutine. Context cancellation is propagated during the
license pre-flight so Ctrl+C aborts cleanly.

Refs DEVX-703

Co-authored-by: linear-code[bot] <222613912+linear-code[bot]@users.noreply.github.com>
Adds end-to-end coverage for the two graceful-degradation paths that were
previously only exercised by mock-based unit tests:

- TestStartFallsBackToLocalImageWhenPullFails: tags a real LocalStack
  image under an unpullable name so the pull fails but the image exists
  locally, and asserts lstk warns and starts the local image.
- TestStartContinuesWhenLicenseServerUnreachable: points the license
  endpoint at a closed server so the pre-flight fails at the transport
  level, and asserts lstk skips the check and the container still starts.

Both require Docker and a valid LOCALSTACK_AUTH_TOKEN (the container must
activate to become healthy), mirroring TestStartCommandSucceedsWithValidToken.

Refs DEVX-703

Co-authored-by: linear-code[bot] <222613912+linear-code[bot]@users.noreply.github.com>
On Ctrl+C mid-pull, the cancelled context made the rt.ImageExists
local-image probe fail, so the start surfaced a misleading "Failed to
pull" error and emitted a spurious start-error telemetry event. Guard the
pull-failure path with ctx.Err() so a user cancel propagates cleanly,
mirroring the existing license pre-flight handling.

Also documents the known limitation that an HTTP error response from the
license server (5xx, or 407 from a proxy) is still treated as a definitive
verdict rather than degrading.

Refs DEVX-703

Co-authored-by: linear-code[bot] <222613912+linear-code[bot]@users.noreply.github.com>
A pinned image that is already present locally is not pulled (pullImages),
so the CLI license pre-flight is now skipped for it too: the redundant
network round-trip would otherwise block an entirely offline start, and the
container validates its own bundled license at startup. This is symmetric
with the existing skip-pull behaviour for local pinned images.

tryPrePullLicenseValidation gains the runtime so it can probe ImageExists;
a probe error is non-fatal and falls through to the pre-flight check.
… path

The two offline start tests started a real container under an isolated
t.TempDir() HOME, so the container's root-owned volume files (e.g.
server.test.pem.key) could not be removed by t.TempDir cleanup, failing in
CI. Use the real (inherited) HOME like every other fresh-start container
test; config stays isolated via --config.

Adds TestStartSkipsPullAndLicenseCheckWhenImageIsLocal covering the #325
review's success path: a pinned configured image found locally starts with
no pull and no CLI license check (asserted via a license server that fails
the test if contacted).
The pinned tag "1.0.0" makes lstk name the container "localstack-aws-1.0.0", but the test inspected the bare "localstack-aws" and relied on cleanup() for that name too. The inspect failed and the real container leaked, holding port 4566 and cascading failures across the shard (Snowflake, Terraform E2E, version-resolution, first-run).

Derive the actual container name and remove it explicitly so it cannot leak.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
The previous TestStartSkipsPullAndLicenseCheckWhenImageIsLocal needed a real LOCALSTACK_AUTH_TOKEN and started a real container. Replace it with a small token-free test that tags a lightweight stand-in image locally and asserts the offline success path at the decision level: the configured custom image is reused (no "Pulling"), the CLI never contacts the license server (license hit count 0), and the container lstk creates uses the local image.

A real container reaching a healthy state from a local image stays covered by TestStartFallsBackToLocalImageWhenPullFails.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
@gtsiolis gtsiolis force-pushed the devx-703-offline-graceful-degradation branch from f51aeaf to 2bc0bae Compare June 29, 2026 13:12
@gtsiolis gtsiolis enabled auto-merge (squash) June 29, 2026 13:14
@gtsiolis

gtsiolis commented Jun 29, 2026

Copy link
Copy Markdown
Member Author

Thanks, @carole-lavillonniere!

@gtsiolis gtsiolis disabled auto-merge June 29, 2026 13:17
@gtsiolis gtsiolis merged commit 5e4f291 into main Jun 29, 2026
13 checks passed
@gtsiolis gtsiolis deleted the devx-703-offline-graceful-degradation branch June 29, 2026 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs: needed Pull request requires documentation updates semver: patch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants