diff --git a/Netgrimoire/Gremlin-Grimoire/CICD_UserGuide.md b/Netgrimoire/Gremlin-Grimoire/CICD_UserGuide.md new file mode 100644 index 0000000..83ca1c2 --- /dev/null +++ b/Netgrimoire/Gremlin-Grimoire/CICD_UserGuide.md @@ -0,0 +1,407 @@ +--- +title: Gremlin CI/CD User Guide +description: +published: true +date: 2026-04-28T20:56:45.863Z +tags: +editor: markdown +dateCreated: 2026-04-28T20:56:45.863Z +--- + +# Gremlin CI/CD — Operator Guide + +> **NetGrimoire Infrastructure Reference** +> How to write, structure, and manage Swarm stacks for the Gremlin CI/CD pipeline. +> For pipeline architecture, see [Gremlin CI/CD Pipeline](gremlin-cicd-wiki.md). + +--- + +## How It Works + +Push any `.yml` or `.yaml` file under `swarm/` to `traveler/services` and Gremlin takes over: + +1. Fetches the file and classifies it (Swarm, Pocket, or plain Compose) +2. Runs all schema checkers +3. If issues found and all are fixable — auto-fixes and recommits +4. If issues found and unfixable — sends ntfy alert, stops +5. If all checks pass — runs Ollama audit, then deploys +6. After deploy — updates Gatus monitoring config + +You get ntfy notifications at every stage. A clean push produces one notification: ✅ Deploy Complete. + +--- + +## Required Stack Structure + +Every Swarm service must have these elements. Missing any will block deployment. + +```yaml +services: + myservice: + image: vendor/image:tag + environment: + PUID: "1964" + PGID: "1964" + TZ: America/Chicago + volumes: + - /DockerVol/myservice:/data # pinned — requires node.hostname + # or + - /data/nfs/znas/Docker/myservice:/data # floating — no hostname needed + networks: + - netgrimoire + deploy: + restart_policy: + condition: any + delay: 5s + max_attempts: 3 + window: 120s + placement: + constraints: + - node.platform.arch != aarch64 + - node.platform.arch != arm + - node.hostname == znas # required when using /DockerVol/ + labels: + caddy: myservice.netgrimoire.com + caddy.reverse_proxy: myservice:8080 + caddy.import_1: crowdsec + caddy.import_2: authentik + + monitor.name: MyService + monitor.url: https://myservice.netgrimoire.com + + homepage.group: NetGrimoire + homepage.name: MyService + homepage.icon: myservice.png + homepage.href: https://myservice.netgrimoire.com + homepage.description: My service description + + diun.enable: "true" + +networks: + netgrimoire: + external: true +``` + +--- + +## Volume Path Rules + +| Path type | Example | Placement constraint | +|---|---|---| +| `/DockerVol/` | `/DockerVol/myservice:/data` | `node.hostname` **required** | +| `/data/nfs/znas/` | `/data/nfs/znas/Docker/myservice:/data` | `node.hostname` **forbidden** | + +Valid hostnames for `node.hostname`: `docker3`, `docker4`, `docker5`, `znas`, `dockerpi1` + +--- + +## Identity Rules + +**Method 1** — LinuxServer.io and homelab images (preferred): +```yaml +environment: + PUID: "1964" + PGID: "1964" +``` + +**Method 2** — Official Docker Hub images: +```yaml +user: "1964:1964" +``` + +**Exemption** — Images that manage their own users (Authentik, MailCow): +```yaml +labels: + gremlin.uid.exempt: "true" + gremlin.uid.reason: "Authentik manages its own internal user context" +``` + +--- + +## Caddy Label Rules + +```yaml +caddy: myservice.netgrimoire.com # hostname only — no https:// prefix +caddy.reverse_proxy: myservice:8080 # service name and port — no IP addresses +caddy.import_1: crowdsec # mandatory +caddy.import_2: authentik # mandatory +``` + +Services without a public URL (internal sidecars, databases): +```yaml +gremlin.caddy.skip: "true" +``` + +--- + +## Monitor Labels + +Gremlin writes monitor endpoints to Gatus after each successful deploy. + +```yaml +monitor.name: MyService # display name in Gatus +monitor.url: https://myservice.netgrimoire.com +monitor.type: http # optional: http | tcp | ping | dns (default: http) +monitor.interval: "60" # optional: seconds, minimum 20 (default: 60) +``` + +Services that should not be monitored: +```yaml +gremlin.monitor.skip: "true" +``` + +TCP example (for non-HTTP services): +```yaml +monitor.type: tcp +monitor.url: myservice:5432 +``` + +--- + +## Homepage Labels + +```yaml +homepage.group: Media # dashboard group +homepage.name: MyService # display name +homepage.icon: myservice.png # icon filename +homepage.href: https://myservice.netgrimoire.com +homepage.description: Brief description +``` + +Services that should not appear on Homepage: +```yaml +gremlin.homepage.skip: "true" +``` + +> **Auto-fix note:** If homepage labels are missing, Gremlin derives them from the caddy: label and service name. Group defaults to "New", icon defaults to "servicename.png". Review and correct after auto-fix. + +--- + +## Gremlin Directives + +All directives go inside `deploy.labels`. All are opt-out — a stack with no `gremlin.*` labels gets full treatment. + +### Pipeline Control + +```yaml +gremlin.enable: "true" +# Set false to have Gremlin ignore this file entirely on push. +# Default: true + +gremlin.checks: "all" +# Comma-separated checker IDs to run, or "all". +# Example: "swarm-syntax,identity,caddy" +# Default: all + +gremlin.checks.skip: "" +# Comma-separated checker IDs to skip. +# Example: "homepage,monitor" +# Default: (none) +``` + +### Auto-fix Control + +```yaml +gremlin.autofix: "true" +# Set false to disable all auto-fixing. +# Default: true + +gremlin.autofix.skip: "false" +# Set true to notify but never attempt to fix. +# Default: false + +gremlin.autofix.skip_fields: "" +# Comma-separated fields to skip during fix. +# Example: "hostname,uid" +# Default: (none) +``` + +### Deploy Control + +```yaml +gremlin.deploy: "true" +# Set false to run checks and fixes but never deploy. +# Use for test stacks or stacks managed manually. +# Default: true + +gremlin.deploy.strategy: "stack" +# Deployment method. Values: stack | helm | kubectl +# Default: stack +``` + +### Identity Exemptions + +```yaml +gremlin.uid.exempt: "false" +# Set true to skip PUID/PGID/user checks. +# Use for images that manage their own users. +# Default: false + +gremlin.uid.reason: "" +# Documents why uid.exempt is set. +# Required when uid.exempt is true. +``` + +### Placement Control + +```yaml +gremlin.arm.allow: "false" +# Set true to allow ARM/Pi deployment. +# Removes ARM exclusion constraints. +# Default: false +``` + +### Service-level Skip Labels + +```yaml +gremlin.caddy.skip: "false" # skip Caddy label validation +gremlin.homepage.skip: "false" # skip Homepage label validation +gremlin.monitor.skip: "false" # skip monitor label validation +gremlin.diun.skip: "false" # skip Diun label validation +gremlin.network.skip: "false" # skip network validation (whole stack) +``` + +### Ollama Context + +```yaml +gremlin.context: "" +# Free text passed to Ollama audit as ground truth. +# Ollama will not flag anything the context explains. +# Example: "OIDC_CLIENT_SECRET in plain text is intentional — no secrets manager in use" +``` + +### Notification Control + +```yaml +gremlin.notify: "true" # false = suppress all ntfy for this stack +gremlin.notify.level: "all" # all | failures | none +``` + +--- + +## Checker IDs + +Use these IDs with `gremlin.checks` and `gremlin.checks.skip`: + +| ID | What it checks | +|---|---| +| `swarm-syntax` | Forbidden fields: version, container_name, hostname, restart, depends_on, dnsrr | +| `identity` | PUID/PGID 1964 or user: "1964:1964" | +| `network` | netgrimoire overlay network | +| `placement` | ARM exclusions, DockerVol/hostname rules, restart_policy | +| `caddy` | caddy: label, reverse_proxy, import_1/import_2 | +| `homepage` | group, name, icon, href, description | +| `monitor` | monitor.name, monitor.url, optional type/interval | +| `legacy-labels` | Flags kuma.* labels for removal | +| `diun` | diun.enable: "true" | + +--- + +## Common Patterns + +### Internal sidecar (database, cache) + +```yaml + postgres: + image: postgres:15 + environment: + POSTGRES_USER: myapp + POSTGRES_PASSWORD: secret + volumes: + - /DockerVol/myapp/postgres:/var/lib/postgresql/data + networks: + - netgrimoire + deploy: + restart_policy: + condition: any + delay: 5s + max_attempts: 3 + window: 120s + placement: + constraints: + - node.platform.arch != aarch64 + - node.platform.arch != arm + - node.hostname == docker4 + labels: + gremlin.caddy.skip: "true" + gremlin.homepage.skip: "true" + gremlin.monitor.skip: "true" + diun.enable: "true" +``` + +### Test stack (never deployed) + +```yaml + labels: + gremlin.deploy: "false" + # ... other labels +``` + +### ARM/Pi service + +```yaml + labels: + gremlin.arm.allow: "true" + # ... other labels + placement: + constraints: + - node.hostname == dockerpi1 +``` + +### Image requiring root + +```yaml + labels: + gremlin.uid.exempt: "true" + gremlin.uid.reason: "Image requires root — does not support PUID/PGID" + # ... other labels +``` + +--- + +## Forbidden Fields + +These fields are automatically removed by Gremlin: + +| Field | Reason | +|---|---| +| `version:` (top-level) | Obsolete in Compose v3 | +| `container_name:` | Conflicts with Swarm service naming | +| `hostname:` (service-level) | Conflicts with Swarm DNS | +| `restart:` (service-level) | Use `deploy.restart_policy` instead | +| `depends_on:` | Not supported in Swarm mode | +| `links:` | Not supported in Swarm mode | + +These fields cause an **unfixable** block: + +| Field | Reason | +|---|---| +| `endpoint_mode: dnsrr` | Breaks internal DNS resolution | +| Missing `deploy:` block | File treated as plain Compose, not Swarm | + +--- + +## Troubleshooting + +**"Missing deploy: block" — file skipped as non-Swarm** +Your compose file has no `deploy:` section. Add a `deploy:` block to each service for Swarm compatibility. + +**"uses /DockerVol/ but has no node.hostname constraint" — unfixable** +Add a `node.hostname` constraint to your `deploy.placement.constraints`. Gremlin cannot guess which node to pin it to. + +**Ollama keeps blocking on legitimate config** +Add `gremlin.context` to explain the situation. Ollama treats context as ground truth and will not flag it. + +**Auto-fix loop — fixes applied but same issues keep appearing** +The fixer is finding the labels but the checker isn't recognizing them after insertion. Check label indentation — labels inside `deploy.labels` must be indented 8 spaces. + +**Deploy skipped every time** +Check `gremlin.deploy` — if set to `"false"` the pipeline validates and fixes but never deploys. + +--- + +## Related + +- [Gremlin CI/CD Pipeline](gremlin-cicd-wiki.md) +- [NetGrimoire Stack Standards](stack-standards.md) +- [Gatus](gatus.md) \ No newline at end of file