From 5d0f756e669197fc601896ac36374e2784d4698c Mon Sep 17 00:00:00 2001 From: Administrator Date: Tue, 28 Apr 2026 20:55:32 +0000 Subject: [PATCH] docs: create Netgrimoire/Gremlin-Grimoire/CICD_Architecture --- .../Gremlin-Grimoire/CICD_Architecture.md | 219 ++++++++++++++++++ 1 file changed, 219 insertions(+) create mode 100644 Netgrimoire/Gremlin-Grimoire/CICD_Architecture.md diff --git a/Netgrimoire/Gremlin-Grimoire/CICD_Architecture.md b/Netgrimoire/Gremlin-Grimoire/CICD_Architecture.md new file mode 100644 index 0000000..807e972 --- /dev/null +++ b/Netgrimoire/Gremlin-Grimoire/CICD_Architecture.md @@ -0,0 +1,219 @@ +--- +title: Gremlin CI/CD Pipepline +description: N8N with LLAMA +published: true +date: 2026-04-28T20:55:22.848Z +tags: +editor: markdown +dateCreated: 2026-04-28T20:55:22.848Z +--- + +# Gremlin CI/CD Pipeline + +> **NetGrimoire Infrastructure Reference** +> Automated validation, auto-fix, and deployment pipeline for Docker Swarm stacks. +> Runs on n8n (docker4). Triggered by Forgejo push webhooks on `traveler/services`. + +--- + +## Overview + +The Gremlin CI/CD pipeline is an n8n workflow that intercepts every push to `traveler/services`, validates changed Swarm compose files against NetGrimoire standards, automatically fixes common issues, and deploys clean stacks to the Swarm cluster. It is the enforcement layer for stack consistency across the homelab. + +The pipeline is modular — each check and fix is a discrete node. Adding a new rule means adding one checker node and one fixer node. Nothing else changes. + +--- + +## Architecture + +### File Detection + +Every push to `traveler/services` fires a webhook to n8n. The pipeline detects changed files via three passes: + +1. **Standard arrays** — `commit.added` and `commit.modified` from the Forgejo payload +2. **Gremlin commit messages** — extracts file path from `gremlin: auto-fix swarm/foo.yaml (N issues fixed)` messages, handling Forgejo's habit of sending empty file arrays for programmatic commits +3. **Compare API fallback** — calls Forgejo's `/api/v1/repos/traveler/services/compare/before...after` if both passes find nothing + +Only files under `swarm/` with `.yml` or `.yaml` extensions are processed. + +### File Classification + +After fetching the file content, Build Envelope classifies it: + +| Type | Detection | Route | +|---|---|---| +| **Pocket** | `pocket.include: "true"` label | Silent exit | +| **Swarm** | Any `deploy:` block present | Full checker chain | +| **Compose** | No `deploy:` block | ntfy warning, skip | + +### Pipeline Flow + +``` +Forgejo Push + └─ Parse Push Payload + └─ Build Envelope + └─ Switch (Pocket / Swarm / Compose) + ├─ Pocket → (silent exit) + ├─ Swarm → Checker Chain + │ └─ Evaluate Checks + │ └─ Switch (hasFailed) + │ ├─ Failed → ntfy: Blocked + │ │ └─ Switch (canFix) + │ │ ├─ Yes → Fixer Chain → Commit → ntfy: Auto-fixed + │ │ └─ No → (stop) + │ └─ Passed → Ollama Audit + │ └─ Switch (ollamaVerdict) + │ ├─ Fail → ntfy: Blocked — Ollama + │ └─ Pass → Deploy Gate + │ └─ Deploy Enabled? + │ ├─ No → ntfy: Deploy Skipped + │ └─ Yes → Prepare Volumes + │ └─ Git Pull + Deploy + │ └─ Gatus Sync + │ └─ ntfy: Deploy Complete + └─ Compose → ntfy: Non-Swarm +``` + +### The Envelope + +Every checker and fixer operates on a single shared object called the **envelope**. It is built once per file and passed through the entire chain, accumulating issues and fixes. + +Key fields: + +| Field | Description | +|---|---| +| `stackName` | Derived from file path | +| `filePath` | Relative path in repo | +| `composeRaw` | Original file content — never modified | +| `fixedRaw` | Accumulates fixer changes — null until first fixer runs | +| `issues[]` | All checker findings | +| `fixes[]` | All fixer actions taken | +| `checkResults{}` | Pass/fail per checker ID | +| `hasFailed` | True if any checker failed | +| `canFix` | True if all issues are fixable and there are issues to fix | +| `isPocket` | True if pocket.include: "true" found | +| `isSwarm` | True if any deploy: block found | +| `directives{}` | Parsed gremlin.* label values | + +--- + +## Checker Chain + +Checkers run in this order. All checkers append to `envelope.issues` and set `envelope.checkResults[id]`. + +| Order | ID | What it checks | +|---|---|---| +| 1 | `swarm-syntax` | Forbidden fields: version, container_name, hostname (service-level), restart, depends_on, dnsrr | +| 2 | `identity` | PUID/PGID must be 1964, or user: "1964:1964" | +| 3 | `network` | netgrimoire overlay declared and attached | +| 4 | `placement` | ARM exclusions, DockerVol/hostname rules, restart_policy | +| 5 | `caddy` | caddy: label, reverse_proxy format, import_1/import_2 | +| 6 | `homepage` | group, name, icon, href, description labels | +| 7 | `monitor` | monitor.name, monitor.url, optional type/interval | +| 8 | `legacy-labels` | Flags any kuma.* labels for removal | +| 9 | `diun` | diun.enable: "true" present | + +### Fixable vs Unfixable + +Auto-fix only runs when **all** issues in the file are fixable. A single unfixable issue blocks the fix chain entirely. + +| Fixable | Unfixable | +|---|---| +| version: key | dnsrr endpoint mode | +| container_name, hostname (service), restart, depends_on | Missing deploy: block | +| Wrong or missing PUID/PGID | Invalid node.hostname value | +| Missing netgrimoire network | hostname missing when DockerVol present | +| ARM exclusion issues | — | +| Hostname present without DockerVol | — | +| Missing restart_policy | — | +| caddy: protocol prefix | — | +| Missing caddy.import_1/import_2 | — | +| Missing homepage labels (derived) | — | +| Missing monitor labels (derived) | — | +| Legacy kuma.* labels (removed) | — | +| Missing diun.enable | — | + +--- + +## Fixer Chain + +Fixers run in the same order as checkers. Each fixer reads from `fixedRaw` (or `composeRaw` if first) and writes its changes back to `fixedRaw`. Changes accumulate correctly across the chain. + +When all fixers complete, the pipeline commits `fixedRaw` back to Forgejo with the message: +``` +gremlin: auto-fix swarm/foo.yaml (N issues fixed) + + - Removed version: key + - Added PUID/PGID 1964 to "app" + - ... +``` + +This commit re-triggers the webhook, and the pipeline runs again on the now-fixed file. + +### Smart Fix Derivation + +Homepage and monitor labels are derived from existing labels rather than placeholders: + +- `homepage.name` / `monitor.name` → `capitalize(serviceName)` +- `homepage.href` / `monitor.url` → `https://` + `caddy:` hostname (falls back to `https://servicename.netgrimoire.com`) +- `homepage.group` → `"New"` when missing +- `homepage.icon` → `servicename.png` +- `homepage.description` → `"Servicename service"` + +--- + +## Ollama Audit + +After all checkers pass, the file is sent to Ollama (`qwen2.5-coder:7b`) for a semantic audit. The prompt explicitly instructs Ollama to: + +- **Ignore:** environment variables, volume paths, port mappings, OIDC/OAuth config, secrets, application-specific settings +- **Check only:** clearly wrong image names, structural errors preventing startup, obviously broken network config + +Ollama is conservative by design — when in doubt it passes. False positives can be suppressed with `gremlin.context`. + +--- + +## Gatus Sync + +After successful deploy, Gatus Sync reads `monitor.*` labels from the deployed compose file and upserts endpoints into `/DockerVol/gatus/config/config.yaml` on znas using base64-encoded SSH writes. Gatus hot-reloads the config automatically. + +Alerts from Gatus go to the `gremlin-watch` ntfy topic. + +--- + +## Infrastructure + +| Component | Value | +|---|---| +| n8n host | `docker4` (192.168.5.16) | +| Swarm manager | `znas` (192.168.5.10) | +| Service account | `gremlin` | +| SSH key | `/home/gremlin/.ssh/id_ed25519` | +| Repo path on znas | `/home/gremlin/services` | +| Webhook path | `gremlin-cicd` | +| ntfy pipeline alerts | `gremlin-alerts` | +| ntfy monitoring alerts | `gremlin-watch` | +| Gatus config | `/DockerVol/gatus/config/config.yaml` | + +--- + +## ntfy Notifications + +| Event | Topic | Priority | +|---|---|---| +| Schema blocked | `gremlin-alerts` | 4 (high) | +| Ollama blocked | `gremlin-alerts` | 4 (high) | +| Auto-fixed | `gremlin-alerts` | 3 (default) | +| Deploy complete | `gremlin-alerts` | 3 (default) | +| Deploy skipped | `gremlin-alerts` | 2 (low) | +| Non-Swarm file | `gremlin-alerts` | 2 (low) | +| Service down/up | `gremlin-watch` | 3 (default) | + +--- + +## Related + +- [Gremlin CI/CD — Operator Guide](gremlin-cicd-guide.md) +- [NetGrimoire Stack Standards](stack-standards.md) +- [Gatus](gatus.md) +- [n8n](n8n.md) \ No newline at end of file