Skip to content

[docs] Add GitOps configuration surfaces design proposal#16

Open
myasnikovdaniil wants to merge 2 commits into
mainfrom
daniil/gitops-surfaces-proposal
Open

[docs] Add GitOps configuration surfaces design proposal#16
myasnikovdaniil wants to merge 2 commits into
mainfrom
daniil/gitops-surfaces-proposal

Conversation

@myasnikovdaniil

@myasnikovdaniil myasnikovdaniil commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a design proposal — GitOps Configuration Surfaces — under design-proposals/gitops-surfaces/.

It defines a single GitOps primitive (GitRepository + Kustomization + a scoped ServiceAccount) and instantiates it at two trust tiers:

  • Platform GitOps (admin, cluster scope) — generalizes feat(platform): add cozystack.customizer system package cozystack#2731, with a two-class Package ownership model (hand-owned bootstrap vs chart-owned downstream) and a packageOverrides admin override layer so component-parameter changes don't race helm-controller for field ownership.
  • Tenant GitOps (tenant, namespace scope) — a catalog gitops app bound to the tenant ServiceAccount, isolated via a dedicated tenant kustomize-controller shard.

Covers admin package/parameter/golden-image configuration, tenant self-service GitOps, and arbitrary cluster resources outside the core API, plus cross-cutting secrets-as-code, DR/reproducibility, and policy-as-code.

Status

Draft — opening for community feedback per the design-proposal process. The first increment (admin surface) is already prototyped in cozystack/cozystack#2731.

Summary by CodeRabbit

  • Documentation
    • Added design proposal for GitOps Configuration Surfaces feature, detailing admin and tenant-level configuration management capabilities
    • Includes package override mechanisms, security boundaries, upgrade/rollback compatibility, and comprehensive testing scope

Proposes a single GitOps primitive (GitRepository + Kustomization + a
scoped ServiceAccount) instantiated at two trust tiers: platform GitOps
for admins (cluster scope) and tenant GitOps for tenants (namespace
scope), plus a packageOverrides admin override layer. Generalizes
cozystack/cozystack#2731 as the first increment.

Signed-off-by: Myasnikov Daniil <myasnikovdaniil2001@gmail.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a design proposal for GitOps Configuration Surfaces in Cozystack, defining a unified primitive to enable platform GitOps for admins and tenant GitOps for users. The review feedback highlights several critical areas for improvement: first, Helm's default list-merging behavior could inadvertently overwrite default lists (like VM images) when applying packageOverrides; second, a potential security bypass exists where tenants could omit or modify the sharding label on Kustomization resources to run them on the unrestricted global controller; and third, restricting the customizer's write access to cozy-system to protect the supply chain would conflict with the need for admins to manage system-level configurations (like the cozystack-values Secret) in that namespace.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +113 to +121
packageOverrides:
linstor:
autoDiskful:
minutes: 10
vm-default-images:
images:
- name: ubuntu-24.04
url: https://example/custom.img
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When implementing the packageOverrides merge logic in the platform chart, keep in mind that standard Helm merge functions (like merge or mustMerge) overwrite entire lists/arrays (such as the images list under vm-default-images) rather than merging them by key or appending to them. If an admin only wants to override or add a single image, they might inadvertently wipe out all other default images. It would be beneficial to specify in the design how list-type overrides are handled or if custom template merging logic will be used.


A tenant must not be able to (a) reconcile into another namespace, or (b) borrow a more privileged SA. Two options, recommendation first:

1. **Dedicated tenant kustomize-controller shard** (recommended) — mirror the existing `flux-tenants` helm-controller shard. Run a kustomize-controller selecting `sharding.fluxcd.io/key: tenants`, started with `--no-cross-namespace-refs` and a default-service-account lockdown, so the restrictions apply *only* to tenant reconciliation and platform Kustomizations are unaffected. This matches a pattern Cozystack already uses.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Using a dedicated tenant kustomize-controller shard that filters by a sharding label (e.g., sharding.fluxcd.io/key: tenants) introduces a potential security bypass. If a tenant can create or modify Kustomization resources directly, they could omit or change this label to have their Kustomization reconciled by the default, unrestricted global kustomize-controller instead. To make this isolation robust, the design should include an admission policy (such as a ValidatingAdmissionPolicy or webhook) that enforces the sharding label on all tenant-created Kustomization resources, or configure the global controller to explicitly ignore resources in tenant namespaces.

Comment on lines +161 to +163
- **Tool 1 (admin):** repo write == cluster-admin, by design and by documentation. The curated ClusterRole in the prototype (patch-not-delete on Packages, no CRDs, etc.) is *defense against accident, not against intent* — anyone with repo write can escalate, so the docs must not over-claim a boundary. New trust surface introduced: an admin git repo + its credentials (protect like a kubeconfig), and a standing, self-healing reconciling credential whose blast radius equals an admin's.
- **Tool 1 footguns to narrow** (they cost nothing functionally): source access should not let the customizer rewrite the platform's *own* supply chain (the `cozystack-packages` OCIRepository in `cozy-system`), so scope write away from `cozy-system`; and field ownership on downstream Packages should eventually be enforced by a `ValidatingAdmissionPolicy` scoped to the customizer SA that rejects changes to fields currently managed by helm-controller (detected via `managedFields`), leaving the hand-owned bootstrap Package free. Until then, the `packageOverrides` layer keeps admins off chart-owned objects entirely, so the contract is mostly avoided rather than merely documented.
- **Tool 2 (tenant):** the surface runs as the tenant's own SA, so it can do nothing the tenant cannot already do via the API. The new risk is **cross-namespace / SA-borrowing**, closed by the tenant shard lockdown (`--no-cross-namespace-refs`, forced `serviceAccountName`). Tenant-supplied input = the contents of a tenant-owned repo, bounded by `cozy:tenant:*`.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

There is a potential conflict between scoping the customizer's write access away from cozy-system (to protect the platform's supply chain like cozystack-packages) and the need for admins to manage system-level configurations. As noted in the Context section, critical admin configurations (such as cert-manager issuers or other platform values) are driven by resources residing in cozy-system (e.g., the cozystack-values Secret). If write access to cozy-system is completely blocked, admins won't be able to manage these via GitOps. Consider refining this to use fine-grained RBAC or a ValidatingAdmissionPolicy that specifically protects the supply chain resources (like OCIRepository or specific HelmReleases) while still allowing the customizer to manage other configurations in cozy-system.

@myasnikovdaniil

Copy link
Copy Markdown
Contributor Author

Revising Tool 1: override layer as a PackageValues resource

Updating the Tool 1 design. The two-class Package ownership model (chart-owned downstream Packages + a hand-owned admin override layer) stays — but I'm changing how the override layer is realized: a dedicated resource rather than a sub-key on the bootstrap Package. It's fully additive / non-breaking. Recording the rationale here; the proposal edit follows.

The problem. Downstream Packages are emitted by the platform chart (packages/core/platform/templates/_helpers.tpl, the cozystack.platform.package helper) with helm.sh/resource-policy: keep, so helm-controller owns their fields. Any out-of-band edit to a chart-owned Package's spec.components[].values races helm-controller for field ownership (kustomize-controller hardcodes force-ownership, so it silently wins — reproducible). Whatever the override layer is, it must not lose that race.

Solution — extend the API with a third resource:

PackageSource   (charts + variants — unchanged)
Package         (the reconciled, system/bundle-owned layer)
PackageValues   (NEW — the human/GitOps-owned override layer)

The operator merges PackageValues over the Package's values when it builds the HelmRelease. The chart never writes PackageValues; humans never write Package. There is no field to fight over — the race is removed structurally. This is not a second override Secret (the operator hardcodes a single valuesFrom: cozystack-values and resets others — internal/operator/package_reconciler.go — so a second Secret loses); PackageValues is a CR the operator itself merges, so it works with the reconciler.

No breaking changes. PackageValues is additive: with none present, the operator path is byte-identical to today (hr.Spec.Values = pkg.Spec.Components[c].Values, with cozystack-values still merged under). effective := deepMerge(packageValues, packageValuesCR) → absent CR ⇒ no change. The one discipline it needs to pay off: humans edit only PackageValues, enforced by an RBAC split (write PackageValues, read-only Package).

Why a resource rather than packageOverrides on the bootstrap Package:

PackageValues (proposed) packageOverrides (alternative)
API change new optional CRD + reconciler merge none (chart-only)
Field-ownership race eliminated (separate object) sidestepped (funnel through chart)
RBAC delegation per-package (write PackageValues, read Package) all-or-nothing (sub-key of one object)
List/map merge (e.g. vm-default-images.images is a 16-entry list) one tested merge in the operator per-package Go-template logic, untested
Override blob one small reviewable resource per package one god-object for all packages
Schema validation validatable per package later free-form JSON

packageOverrides is a reasonable lighter-weight alternative — no Go, ships purely as chart templating — and is essentially PackageValues crammed into one chart-rendered blob. The tradeoff is that it sidesteps the field-ownership race instead of removing it, can't do per-package RBAC delegation, and pushes list-merge correctness into per-package templates. Since PackageValues is additive, shipping packageOverrides first wouldn't block it — but it would make the packageOverrides key throwaway.

Feedback welcome on the precedence chain (chart defaults < cozystack-values < Package < PackageValues) and the RBAC split before I flesh out the CRD.

…layer

Recast Tool 1's override layer from a `packageOverrides` map on the bootstrap
Package (merged into downstream Packages by the platform chart) to a dedicated
`PackageValues` API kind that the operator merges into each HelmRelease.
`packageOverrides` is kept as a lighter-weight chart-only alternative/interim.

PackageValues removes the helm-controller force-ownership race structurally
(the chart never writes PackageValues; the admin never writes the chart-owned
Package), makes ownership enforceable by object-level RBAC, and is fully
additive — absent a PackageValues, the operator builds HelmReleases as today.

Updates the admin override layer, user-facing / upgrade / security / failure /
testing / rollout sections, open question 3 (now PackageValues precedence and
binding), and alternatives considered.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Myasnikov Daniil <myasnikovdaniil2001@gmail.com>
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds design-proposals/gitops-surfaces/README.md, a new design proposal document for Cozystack defining two reconciled GitOps surface primitives (admin/platform and tenant scopes) built from GitRepository + Kustomization + scoped ServiceAccount, a new PackageValues CR for component override layering, tenant isolation via a dedicated kustomize-controller shard, and associated security, testing, rollout, and alternatives documentation.

Changes

GitOps Configuration Surfaces Design Proposal

Layer / File(s) Summary
Metadata, context, goals, and non-goals
design-proposals/gitops-surfaces/README.md
Document header, overview, related proposals, current Flux usage context, stated goals, and explicit non-goals bounding the proposal.
Shared primitive, Platform GitOps, and PackageValues CR
design-proposals/gitops-surfaces/README.md
Defines the shared GitRepository + Kustomization + ServiceAccount surface primitive, specifies Tool 1 (admin trust model, bootstrap vs downstream Package lifecycle/field-ownership risk), and introduces the PackageValues CR for additive, backward-compatible component parameter overrides merged into rendered HelmRelease values.
Tenant GitOps tool and cross-cutting capabilities
design-proposals/gitops-surfaces/README.md
Describes Tool 2 as a catalog-app delivering a tenant-namespace GitRepository + Kustomization, specifies tenant isolation via a dedicated kustomize-controller shard or global Flux multitenancy flags, and lists secrets-as-code, status surfacing, and user-facing documentation changes.
Upgrade/rollback compatibility and security trust boundaries
design-proposals/gitops-surfaces/README.md
Documents additive opt-in upgrade path, rollback teardown behavior tied to helm.sh/resource-policy: keep, admin and tenant trust boundary definitions, risk footguns, and enumerated failure/edge cases.
Testing plan, rollout phases, open questions, alternatives, and attribution
design-proposals/gitops-surfaces/README.md
Specifies test categories (Helm unit, operator unit, e2e BATS, manual auth matrices), outlines four rollout phases, lists open questions on Flux embedding and PackageValues semantics, enumerates alternatives considered, and adds attribution footer.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

Poem

🐇 A rabbit hopped through YAML fields,
dreaming of repos that clusters wield.
With Kustomizations neatly stacked,
and PackageValues cleanly tracked,
two tiers of trust, both admin and tenant—
GitOps surfaces, fluffy and pleasant! 🌿

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[docs] Add GitOps configuration surfaces design proposal' directly and accurately summarizes the main change: adding a design proposal document about GitOps configuration surfaces.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch daniil/gitops-surfaces-proposal

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@design-proposals/gitops-surfaces/README.md`:
- Around line 130-131: The proposal currently presents RBAC enforcement as a
mechanism that prevents field-level edits, but this is actually a
human-discipline contract with no API-level enforcement. Revise lines 130-131 to
clarify that the split design structurally avoids the contract violation (by
separating concerns into PackageValues and Package) rather than enforcing it via
RBAC alone. Additionally, update the text to explicitly state that the admission
policy is a prerequisite for production use, not an optional enhancement. Make
clear that without field-ownership enforcement at the API level, admins must
manually follow the documented protocol, and any deviation recreates the
helm-controller race conditions the PackageValues design was meant to prevent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d7a0a413-b6cb-49ef-ba12-c1bd9336da65

📥 Commits

Reviewing files that changed from the base of the PR and between abb40d0 and 9faf7e0.

📒 Files selected for processing (1)
  • design-proposals/gitops-surfaces/README.md

Comment on lines +130 to +131
The split is **additive**: with no `PackageValues` present, the operator path is byte-identical to today, so existing clusters are unaffected (see Upgrade compatibility). It also makes ownership enforceable by ordinary object-level RBAC — the admin GitOps SA gets write on `PackageValues` and read-only on `Package` — which is the discipline that keeps admins off chart-owned objects entirely, the foundation for whole-cluster reproducibility (UC5).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

RBAC enforcement strategy relies on discipline until admission policy lands.

Lines 130–131 assert that object-level RBAC (write PackageValues, read-only Package) can enforce the field-level contract and prevent admins from patching chart-owned Package fields. However, this is a human-discipline contract — nothing in the CR syntax or API prevents an admin from editing a chart-owned Package directly.

The security section (line 171) correctly acknowledges this as a footgun: "until the admission policy lands, the PackageValues resource keeps admins off chart-owned objects entirely" — they write PackageValues, never the chart-owned Package — so the contract is structurally avoided rather than merely documented.

Recommendation: Ensure the admission policy (or equivalent validation) is treated as a prerequisite for production use of this surface, not a "nice-to-have." The proposal should explicitly state that without field-ownership enforcement, admins must follow the documented protocol manually, and any deviation creates the same helm-controller race the PackageValues design was meant to avoid.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@design-proposals/gitops-surfaces/README.md` around lines 130 - 131, The
proposal currently presents RBAC enforcement as a mechanism that prevents
field-level edits, but this is actually a human-discipline contract with no
API-level enforcement. Revise lines 130-131 to clarify that the split design
structurally avoids the contract violation (by separating concerns into
PackageValues and Package) rather than enforcing it via RBAC alone.
Additionally, update the text to explicitly state that the admission policy is a
prerequisite for production use, not an optional enhancement. Make clear that
without field-ownership enforcement at the API level, admins must manually
follow the documented protocol, and any deviation recreates the helm-controller
race conditions the PackageValues design was meant to prevent.

@myasnikovdaniil myasnikovdaniil self-assigned this Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant