docs: create Netgrimoire/Gremlin-Grimoire/CICD_UserGuide

This commit is contained in:
Administrator 2026-04-28 20:56:55 +00:00 committed by John Smith
parent 5d0f756e66
commit 8a6320e9ca

View file

@ -0,0 +1,407 @@
---
title: Gremlin CI/CD User Guide
description:
published: true
date: 2026-04-28T20:56:45.863Z
tags:
editor: markdown
dateCreated: 2026-04-28T20:56:45.863Z
---
# Gremlin CI/CD — Operator Guide
> **NetGrimoire Infrastructure Reference**
> How to write, structure, and manage Swarm stacks for the Gremlin CI/CD pipeline.
> For pipeline architecture, see [Gremlin CI/CD Pipeline](gremlin-cicd-wiki.md).
---
## How It Works
Push any `.yml` or `.yaml` file under `swarm/` to `traveler/services` and Gremlin takes over:
1. Fetches the file and classifies it (Swarm, Pocket, or plain Compose)
2. Runs all schema checkers
3. If issues found and all are fixable — auto-fixes and recommits
4. If issues found and unfixable — sends ntfy alert, stops
5. If all checks pass — runs Ollama audit, then deploys
6. After deploy — updates Gatus monitoring config
You get ntfy notifications at every stage. A clean push produces one notification: ✅ Deploy Complete.
---
## Required Stack Structure
Every Swarm service must have these elements. Missing any will block deployment.
```yaml
services:
myservice:
image: vendor/image:tag
environment:
PUID: "1964"
PGID: "1964"
TZ: America/Chicago
volumes:
- /DockerVol/myservice:/data # pinned — requires node.hostname
# or
- /data/nfs/znas/Docker/myservice:/data # floating — no hostname needed
networks:
- netgrimoire
deploy:
restart_policy:
condition: any
delay: 5s
max_attempts: 3
window: 120s
placement:
constraints:
- node.platform.arch != aarch64
- node.platform.arch != arm
- node.hostname == znas # required when using /DockerVol/
labels:
caddy: myservice.netgrimoire.com
caddy.reverse_proxy: myservice:8080
caddy.import_1: crowdsec
caddy.import_2: authentik
monitor.name: MyService
monitor.url: https://myservice.netgrimoire.com
homepage.group: NetGrimoire
homepage.name: MyService
homepage.icon: myservice.png
homepage.href: https://myservice.netgrimoire.com
homepage.description: My service description
diun.enable: "true"
networks:
netgrimoire:
external: true
```
---
## Volume Path Rules
| Path type | Example | Placement constraint |
|---|---|---|
| `/DockerVol/` | `/DockerVol/myservice:/data` | `node.hostname` **required** |
| `/data/nfs/znas/` | `/data/nfs/znas/Docker/myservice:/data` | `node.hostname` **forbidden** |
Valid hostnames for `node.hostname`: `docker3`, `docker4`, `docker5`, `znas`, `dockerpi1`
---
## Identity Rules
**Method 1** — LinuxServer.io and homelab images (preferred):
```yaml
environment:
PUID: "1964"
PGID: "1964"
```
**Method 2** — Official Docker Hub images:
```yaml
user: "1964:1964"
```
**Exemption** — Images that manage their own users (Authentik, MailCow):
```yaml
labels:
gremlin.uid.exempt: "true"
gremlin.uid.reason: "Authentik manages its own internal user context"
```
---
## Caddy Label Rules
```yaml
caddy: myservice.netgrimoire.com # hostname only — no https:// prefix
caddy.reverse_proxy: myservice:8080 # service name and port — no IP addresses
caddy.import_1: crowdsec # mandatory
caddy.import_2: authentik # mandatory
```
Services without a public URL (internal sidecars, databases):
```yaml
gremlin.caddy.skip: "true"
```
---
## Monitor Labels
Gremlin writes monitor endpoints to Gatus after each successful deploy.
```yaml
monitor.name: MyService # display name in Gatus
monitor.url: https://myservice.netgrimoire.com
monitor.type: http # optional: http | tcp | ping | dns (default: http)
monitor.interval: "60" # optional: seconds, minimum 20 (default: 60)
```
Services that should not be monitored:
```yaml
gremlin.monitor.skip: "true"
```
TCP example (for non-HTTP services):
```yaml
monitor.type: tcp
monitor.url: myservice:5432
```
---
## Homepage Labels
```yaml
homepage.group: Media # dashboard group
homepage.name: MyService # display name
homepage.icon: myservice.png # icon filename
homepage.href: https://myservice.netgrimoire.com
homepage.description: Brief description
```
Services that should not appear on Homepage:
```yaml
gremlin.homepage.skip: "true"
```
> **Auto-fix note:** If homepage labels are missing, Gremlin derives them from the caddy: label and service name. Group defaults to "New", icon defaults to "servicename.png". Review and correct after auto-fix.
---
## Gremlin Directives
All directives go inside `deploy.labels`. All are opt-out — a stack with no `gremlin.*` labels gets full treatment.
### Pipeline Control
```yaml
gremlin.enable: "true"
# Set false to have Gremlin ignore this file entirely on push.
# Default: true
gremlin.checks: "all"
# Comma-separated checker IDs to run, or "all".
# Example: "swarm-syntax,identity,caddy"
# Default: all
gremlin.checks.skip: ""
# Comma-separated checker IDs to skip.
# Example: "homepage,monitor"
# Default: (none)
```
### Auto-fix Control
```yaml
gremlin.autofix: "true"
# Set false to disable all auto-fixing.
# Default: true
gremlin.autofix.skip: "false"
# Set true to notify but never attempt to fix.
# Default: false
gremlin.autofix.skip_fields: ""
# Comma-separated fields to skip during fix.
# Example: "hostname,uid"
# Default: (none)
```
### Deploy Control
```yaml
gremlin.deploy: "true"
# Set false to run checks and fixes but never deploy.
# Use for test stacks or stacks managed manually.
# Default: true
gremlin.deploy.strategy: "stack"
# Deployment method. Values: stack | helm | kubectl
# Default: stack
```
### Identity Exemptions
```yaml
gremlin.uid.exempt: "false"
# Set true to skip PUID/PGID/user checks.
# Use for images that manage their own users.
# Default: false
gremlin.uid.reason: ""
# Documents why uid.exempt is set.
# Required when uid.exempt is true.
```
### Placement Control
```yaml
gremlin.arm.allow: "false"
# Set true to allow ARM/Pi deployment.
# Removes ARM exclusion constraints.
# Default: false
```
### Service-level Skip Labels
```yaml
gremlin.caddy.skip: "false" # skip Caddy label validation
gremlin.homepage.skip: "false" # skip Homepage label validation
gremlin.monitor.skip: "false" # skip monitor label validation
gremlin.diun.skip: "false" # skip Diun label validation
gremlin.network.skip: "false" # skip network validation (whole stack)
```
### Ollama Context
```yaml
gremlin.context: ""
# Free text passed to Ollama audit as ground truth.
# Ollama will not flag anything the context explains.
# Example: "OIDC_CLIENT_SECRET in plain text is intentional — no secrets manager in use"
```
### Notification Control
```yaml
gremlin.notify: "true" # false = suppress all ntfy for this stack
gremlin.notify.level: "all" # all | failures | none
```
---
## Checker IDs
Use these IDs with `gremlin.checks` and `gremlin.checks.skip`:
| ID | What it checks |
|---|---|
| `swarm-syntax` | Forbidden fields: version, container_name, hostname, restart, depends_on, dnsrr |
| `identity` | PUID/PGID 1964 or user: "1964:1964" |
| `network` | netgrimoire overlay network |
| `placement` | ARM exclusions, DockerVol/hostname rules, restart_policy |
| `caddy` | caddy: label, reverse_proxy, import_1/import_2 |
| `homepage` | group, name, icon, href, description |
| `monitor` | monitor.name, monitor.url, optional type/interval |
| `legacy-labels` | Flags kuma.* labels for removal |
| `diun` | diun.enable: "true" |
---
## Common Patterns
### Internal sidecar (database, cache)
```yaml
postgres:
image: postgres:15
environment:
POSTGRES_USER: myapp
POSTGRES_PASSWORD: secret
volumes:
- /DockerVol/myapp/postgres:/var/lib/postgresql/data
networks:
- netgrimoire
deploy:
restart_policy:
condition: any
delay: 5s
max_attempts: 3
window: 120s
placement:
constraints:
- node.platform.arch != aarch64
- node.platform.arch != arm
- node.hostname == docker4
labels:
gremlin.caddy.skip: "true"
gremlin.homepage.skip: "true"
gremlin.monitor.skip: "true"
diun.enable: "true"
```
### Test stack (never deployed)
```yaml
labels:
gremlin.deploy: "false"
# ... other labels
```
### ARM/Pi service
```yaml
labels:
gremlin.arm.allow: "true"
# ... other labels
placement:
constraints:
- node.hostname == dockerpi1
```
### Image requiring root
```yaml
labels:
gremlin.uid.exempt: "true"
gremlin.uid.reason: "Image requires root — does not support PUID/PGID"
# ... other labels
```
---
## Forbidden Fields
These fields are automatically removed by Gremlin:
| Field | Reason |
|---|---|
| `version:` (top-level) | Obsolete in Compose v3 |
| `container_name:` | Conflicts with Swarm service naming |
| `hostname:` (service-level) | Conflicts with Swarm DNS |
| `restart:` (service-level) | Use `deploy.restart_policy` instead |
| `depends_on:` | Not supported in Swarm mode |
| `links:` | Not supported in Swarm mode |
These fields cause an **unfixable** block:
| Field | Reason |
|---|---|
| `endpoint_mode: dnsrr` | Breaks internal DNS resolution |
| Missing `deploy:` block | File treated as plain Compose, not Swarm |
---
## Troubleshooting
**"Missing deploy: block" — file skipped as non-Swarm**
Your compose file has no `deploy:` section. Add a `deploy:` block to each service for Swarm compatibility.
**"uses /DockerVol/ but has no node.hostname constraint" — unfixable**
Add a `node.hostname` constraint to your `deploy.placement.constraints`. Gremlin cannot guess which node to pin it to.
**Ollama keeps blocking on legitimate config**
Add `gremlin.context` to explain the situation. Ollama treats context as ground truth and will not flag it.
**Auto-fix loop — fixes applied but same issues keep appearing**
The fixer is finding the labels but the checker isn't recognizing them after insertion. Check label indentation — labels inside `deploy.labels` must be indented 8 spaces.
**Deploy skipped every time**
Check `gremlin.deploy` — if set to `"false"` the pipeline validates and fixes but never deploys.
---
## Related
- [Gremlin CI/CD Pipeline](gremlin-cicd-wiki.md)
- [NetGrimoire Stack Standards](stack-standards.md)
- [Gatus](gatus.md)