New Grimoire

2026-04-12 09:53:51 -05:00 · 2026-04-12 09:53:51 -05:00 · cc574f8aed
commit cc574f8aed
parent 77d589a13d
157 changed files with 29420 additions and 0 deletions
--- a/Gremlin-Grimoire/Workflows/Forgejo-Audit.md
+++ b/Gremlin-Grimoire/Workflows/Forgejo-Audit.md
@ -0,0 +1,105 @@
+---
+title: Forgejo Audit Workflow
+description: Weekly automated YAML compliance audit via n8n + Ollama
+published: true
+date: 2026-04-12T00:00:00.000Z
+tags: gremlin, n8n, audit, forgejo
+editor: markdown
+dateCreated: 2026-04-12T00:00:00.000Z
+---
+
+# Forgejo Audit Workflow
+
+**Status:** ✅ Live and confirmed working
+
+Runs every Monday at 06:00. Walks all compose YAML files in `services/swarm/` and `services/swarm/stack/*/`, audits each one against the Swarm template standard using `qwen2.5-coder:7b`, and commits full reports to Forgejo + sends a summary to ntfy.
+
+---
+
+## What It Audits
+
+Each file is checked for:
+- Homepage labels on all services
+- Uptime Kuma labels on all services
+- Caddy labels on exposed services
+- `node.platform.arch` exclusion constraints (ARM default)
+- Volume paths follow `/DockerVol/` or `/data/nfs/znas/Docker/` convention
+- No forbidden fields (`version:`, `container_name:`, `restart:`, `depends_on:`)
+- `endpoint_mode: dnsrr` not used
+- `diun.enable: "true"` present
+- Network references `netgrimoire` external overlay
+
+---
+
+## Scope
+
+~67 files total across `swarm/` (flat single-service YAMLs) and `swarm/stack/*/` (grouped stacks).
+
+---
+
+## Outputs
+
+| Output | Where | Content |
+|--------|-------|---------|
+| ntfy notification | `gremlin-audits` topic | Short FAIL summary per file |
+| Forgejo commit | `Netgrimoire/Audits/AUDIT-<name>-<date>.md` | Full audit report (POST new / PUT+SHA update) |
+
+---
+
+## n8n Architecture
+
+```
+Schedule Trigger (Mon 06:00)
+  → Forgejo API: list all files in swarm/ and swarm/stack/*/
+  → Loop Over Items (splitInBatches, batch=1)
+      → Code node: fetch file content via Forgejo API
+      → Code node: build Ollama prompt
+      → Code node: POST to Ollama (qwen2.5-coder:7b)
+      → Code node: parse result, build report markdown
+      → Code node: commit report to Forgejo (POST or PUT+SHA)
+      → Code node: send ntfy summary if FAIL
+  → Loop feedback connection drives iteration
+```
+
+---
+
+## Critical Patterns
+
+All Forgejo and Ollama API calls use `this.helpers.httpRequest()` in Code nodes — **not** HTTP Request nodes. HTTP Request nodes hit body expression limits on large prompts.
+
+Code nodes in "Run Once for Each Item" mode must return `{ json: ... }` not `[{ json: ... }]`.
+
+Loop Over Items (splitInBatches, batch=1) + feedback connection from last node back to loop drives iteration over multiple files.
+
+---
+
+## Critical Environment Variables
+
+| Variable | Value | Why |
+|----------|-------|-----|
+| `N8N_BLOCK_ENV_ACCESS_IN_NODE` | `false` | Allows env var access inside Code nodes |
+| `N8N_RUNNERS_TASK_TIMEOUT` | `3600` | Prevents timeout on 67-file audit runs |
+
+---
+
+## Forgejo API Tokens
+
+| Token | Scope |
+|-------|-------|
+| Read token | Fetch file content from `traveler/services` |
+| Write token | Commit audit reports to `traveler/Netgrimoire` |
+
+Tokens stored in n8n credentials, not in compose env vars.
+
+---
+
+## Forgejo Webhook Gotcha
+
+If Forgejo webhooks fail to reach n8n, add to Forgejo `app.ini`:
+
+```ini
+[webhook]
+ALLOWED_HOST_LIST = *
+```
+
+Required when `OFFLINE_MODE = true`. Restart Forgejo after edit.
--- a/Gremlin-Grimoire/Workflows/Kuma-Triage.md
+++ b/Gremlin-Grimoire/Workflows/Kuma-Triage.md
@ -0,0 +1,63 @@
+---
+title: Kuma Alert Triage Workflow
+description: Uptime Kuma webhook → Ollama analysis → ntfy alert
+published: true
+date: 2026-04-12T00:00:00.000Z
+tags: gremlin, n8n, kuma, alerts
+editor: markdown
+dateCreated: 2026-04-12T00:00:00.000Z
+---
+
+# Kuma Alert Triage Workflow
+
+**Status:** ✅ Live and confirmed working
+
+Triggered by Uptime Kuma webhook on service DOWN or RECOVERED events. DOWN events are analyzed by `llama3.2:3b` before alerting. RECOVERED events skip AI and send a simple notification.
+
+---
+
+## Webhook URL
+
+```
+https://n8n.netgrimoire.com/webhook/gremlin-kuma-alert
+```
+
+Configure in Uptime Kuma: Settings → Notifications → Webhook → apply to all monitors.
+
+---
+
+## Flow
+
+```
+Kuma Webhook
+  ├── DOWN path:
+  │     → Parse payload (service name, URL, error)
+  │     → Ollama (llama3.2:3b): triage prompt
+  │     → ntfy gremlin-alerts (urgent priority) with AI analysis
+  │
+  └── RECOVERED path:
+        → ntfy gremlin-alerts (normal priority, no AI call)
+```
+
+---
+
+## Why Two Paths
+
+AI triage is only useful for DOWN events — there's nothing to analyze on a recovery. Skipping Ollama on RECOVERED keeps notification latency near-instant for good news.
+
+---
+
+## ntfy Output Format
+
+DOWN alert includes:
+- Service name and URL
+- Kuma error message
+- Ollama's triage assessment (probable cause, suggested first step)
+
+RECOVERED alert is a simple one-liner.
+
+---
+
+## Parked: Doc Generation Workflows
+
+Two additional doc generation workflows were built but are currently inactive. CPU-only `llama3.2:3b` output barely exceeds reformatting the source compose file — not useful enough to commit. Will be revisited when GPU support is added to the Gremlin stack.