13 KiB
| title | description | published | date | tags | editor | dateCreated |
|---|---|---|---|---|---|---|
| Gremlin CI/CD User Guide | true | 2026-04-30T18:33:09.881Z | markdown | 2026-04-28T20:56:45.863Z |
Gremlin CI/CD — Operator Guide
NetGrimoire Infrastructure Reference How to write, structure, and manage Swarm stacks for the Gremlin CI/CD pipeline. For pipeline architecture, see Gremlin CI/CD Pipeline.
How It Works
Push any .yml or .yaml file under swarm/ to traveler/services and Gremlin takes over:
- Fetches the file and classifies it (Swarm, Pocket, or plain Compose)
- Runs all schema checkers
- If issues found and all are fixable — auto-fixes and recommits
- If issues found and unfixable — sends ntfy alert, stops
- If all checks pass — runs Ollama audit, then deploys
- After deploy — updates Gatus monitoring config
You get ntfy notifications at every stage. A clean push produces one notification: ✅ Deploy Complete.
Required Stack Structure
Every Swarm service must have these elements. Missing any will block deployment.
services:
myservice:
image: vendor/image:tag
environment:
PUID: "1964"
PGID: "1964"
TZ: America/Chicago
volumes:
- /DockerVol/myservice:/data # pinned — requires node.hostname
# or
- /data/nfs/znas/Docker/myservice:/data # floating — no hostname needed
networks:
- netgrimoire
deploy:
restart_policy:
condition: any
delay: 5s
max_attempts: 3
window: 120s
placement:
constraints:
- node.platform.arch != aarch64
- node.platform.arch != arm
- node.hostname == znas # required when using /DockerVol/
labels:
caddy: myservice.netgrimoire.com
caddy.reverse_proxy: myservice:8080
caddy.import_1: crowdsec
caddy.import_2: authentik
monitor.name: MyService
monitor.url: http://myservice:8080 # internal URL preferred
homepage.group: NetGrimoire
homepage.name: MyService
homepage.icon: myservice.png
homepage.href: https://myservice.netgrimoire.com
homepage.description: My service description
diun.enable: "true"
networks:
netgrimoire:
external: true
Volume Path Rules
| Path type | Example | Placement constraint |
|---|---|---|
/DockerVol/ |
/DockerVol/myservice:/data |
node.hostname required |
/data/nfs/znas/ |
/data/nfs/znas/Docker/myservice:/data |
node.hostname not required |
Valid hostnames for node.hostname: docker3, docker4, docker5, znas, dockerpi1
Identity Rules
Method 1 — LinuxServer.io and homelab images (preferred):
environment:
PUID: "1964"
PGID: "1964"
Method 2 — Official Docker Hub images:
user: "1964:1964"
Exemption — Images that manage their own users (Authentik, MailCow, Postgres, Redis):
labels:
gremlin.uid.exempt: "true"
gremlin.uid.reason: "Postgres manages its own user — requires UID 999"
When uid.exempt is set, Prepare Volumes will mkdir the service's volume paths but will not chown them. The image manages its own ownership.
Caddy Label Rules
caddy: myservice.netgrimoire.com # hostname only — no https:// prefix
caddy.reverse_proxy: myservice:8080 # service name and port — no IP addresses
caddy.import_1: crowdsec # always required
caddy.import_2: authentik # required unless gremlin.authentik.skip is set
Services without a public URL (internal sidecars, databases):
gremlin.caddy.skip: "true"
Services that should bypass Authentik but still go through CrowdSec:
gremlin.authentik.skip: "true"
Monitor Labels
Gremlin writes monitor endpoints to Gatus after each successful deploy. Monitor URLs should use the internal service name and port so Gatus checks the container directly without depending on Caddy or Authentik being up.
monitor.name: MyService # display name in Gatus
monitor.url: http://myservice:8080 # internal URL preferred
monitor.type: http # optional: http | tcp | ping | dns (default: http)
monitor.interval: "60" # optional: seconds, minimum 20 (default: 60)
For non-HTTP services (mail, databases):
monitor.type: tcp
monitor.url: tcp://myservice:5432
Services that should not be monitored:
gremlin.monitor.skip: "true"
Gatus determines the check condition from the URL scheme:
http://orhttps://→[STATUS] == 200tcp://ortype: tcp→[CONNECTED] == truetype: ping→[CONNECTED] == true
Homepage Labels
homepage.group: Media # dashboard group
homepage.name: MyService # display name
homepage.icon: myservice.png # icon filename
homepage.href: https://myservice.netgrimoire.com
homepage.description: Brief description
Services that should not appear on Homepage:
gremlin.homepage.skip: "true"
Auto-fix note: If homepage labels are missing, Gremlin derives them from the caddy: label and service name. Group defaults to "New", icon defaults to "servicename.png". Review and correct after auto-fix.
Gremlin Directives Reference
All directives go inside deploy.labels. All are opt-out — a stack with no gremlin.* labels gets full treatment.
Pipeline Control
| Directive | Default | Description |
|---|---|---|
gremlin.enable |
true |
Set false to have Gremlin ignore this file entirely on push |
gremlin.checks |
all |
Comma-separated checker IDs to run, or all |
gremlin.checks.skip |
(none) | Comma-separated checker IDs to skip |
gremlin.version |
(auto) | Stamped automatically — do not set manually |
gremlin.context |
(none) | Free text passed to Ollama as ground truth — Ollama will not flag anything this explains |
Auto-fix Control
| Directive | Default | Description |
|---|---|---|
gremlin.autofix |
true |
Set false to disable all auto-fixing |
gremlin.autofix.skip |
false |
Set true to notify but never attempt to fix |
gremlin.autofix.skip_fields |
(none) | Comma-separated fields to skip during fix (e.g. uid,hostname) |
Deploy Control
| Directive | Default | Description |
|---|---|---|
gremlin.deploy |
true |
Set false to run checks and fixes but never deploy |
gremlin.deploy.strategy |
stack |
Deployment method — currently only stack is implemented |
gremlin.port |
(none) | Internal container port when no ports: mapping exists — used to derive caddy.reverse_proxy and monitor.url |
Identity
| Directive | Default | Description |
|---|---|---|
gremlin.uid.exempt |
false |
Skip PUID/PGID/user checks and skip chown on volumes for this service |
gremlin.uid.reason |
(none) | Documents why uid.exempt is set — include with every exemption |
Placement
| Directive | Default | Description |
|---|---|---|
gremlin.arm.allow |
false |
Allow ARM/Pi deployment — removes ARM exclusion constraint requirement |
Caddy
| Directive | Default | Description |
|---|---|---|
gremlin.caddy.skip |
false |
Skip all Caddy label checks for this service |
gremlin.authentik.skip |
false |
Skip caddy.import_2: authentik requirement only — CrowdSec still required |
Homepage
| Directive | Default | Description |
|---|---|---|
gremlin.homepage.skip |
false |
Skip Homepage label checks for this service |
Monitor
| Directive | Default | Description |
|---|---|---|
gremlin.monitor.skip |
false |
Skip monitor label checks for this service |
Network
| Directive | Default | Description |
|---|---|---|
gremlin.network.skip |
false |
Skip netgrimoire network checks for this service |
Diun
| Directive | Default | Description |
|---|---|---|
gremlin.diun.skip |
false |
Skip diun.enable check for this service |
Notifications
| Directive | Default | Description |
|---|---|---|
gremlin.notify |
true |
Set false to suppress all ntfy notifications for this stack |
gremlin.notify.level |
all |
all | failures | none |
Checker IDs
Use these IDs with gremlin.checks and gremlin.checks.skip:
| ID | What it checks |
|---|---|
swarm-syntax |
Forbidden fields: version, container_name, hostname, restart, depends_on, dnsrr |
identity |
PUID/PGID 1964 or user: "1964:1964" |
network |
netgrimoire overlay network attached |
placement |
ARM exclusions, DockerVol/hostname rules, restart_policy |
caddy |
caddy: label, reverse_proxy format, import_1/import_2 |
homepage |
group, name, icon, href, description |
monitor |
monitor.name, monitor.url, optional type/interval |
legacy-labels |
Flags kuma.* labels for removal |
version |
gremlin.version stamp matches current config version |
diun |
diun.enable: "true" present |
Common Patterns
Internal sidecar (database, cache)
postgres:
image: postgres:15
environment:
POSTGRES_USER: myapp
POSTGRES_PASSWORD: secret
volumes:
- /DockerVol/myapp/postgres:/var/lib/postgresql/data
networks:
- netgrimoire
deploy:
restart_policy:
condition: any
delay: 5s
max_attempts: 3
window: 120s
placement:
constraints:
- node.platform.arch != aarch64
- node.platform.arch != arm
- node.hostname == docker4
labels:
gremlin.uid.exempt: "true"
gremlin.uid.reason: "Postgres requires UID 999"
gremlin.caddy.skip: "true"
gremlin.homepage.skip: "true"
gremlin.monitor.skip: "true"
diun.enable: "true"
Service without Authentik (remote browser, public endpoint)
labels:
caddy: firefox.netgrimoire.com
caddy.reverse_proxy: firefox:5800
caddy.import_1: crowdsec
gremlin.authentik.skip: "true"
# ... other labels
Service with no web UI and no public port
labels:
gremlin.caddy.skip: "true"
gremlin.homepage.skip: "true"
gremlin.monitor.skip: "true"
diun.enable: "true"
Test stack (never deployed)
labels:
gremlin.deploy: "false"
# ... other labels
ARM/Pi service
labels:
gremlin.arm.allow: "true"
# ... other labels
placement:
constraints:
- node.hostname == dockerpi1
Service with no ports: mapping
labels:
gremlin.port: "8080"
# tells Gremlin the internal port for caddy and monitor derivation
# ... other labels
Ollama false positive suppression
labels:
gremlin.context: "shm_size is set to 1gb — required for this browser application"
# ... other labels
Forbidden Fields
These fields are automatically removed by Gremlin:
| Field | Reason |
|---|---|
version: (top-level) |
Obsolete in Compose v3 |
container_name: |
Conflicts with Swarm service naming |
hostname: (service-level) |
Conflicts with Swarm DNS |
restart: (service-level) |
Use deploy.restart_policy instead |
depends_on: |
Not supported in Swarm mode |
These fields cause an unfixable block — Gremlin cannot fix them automatically:
| Field | Reason |
|---|---|
endpoint_mode: dnsrr |
Breaks internal DNS resolution — VIP mode required |
Missing deploy: block |
File treated as plain Compose, not Swarm |
/DockerVol/ without node.hostname |
Gremlin cannot guess the target node |
Troubleshooting
"Missing deploy: block" — file skipped as non-Swarm
Your compose file has no deploy: section. Add a deploy: block to each service.
"uses /DockerVol/ but has no node.hostname constraint" — unfixable
Add a node.hostname constraint to deploy.placement.constraints. Gremlin cannot guess which node to pin it to.
PUID/PGID landing under volumes:
Your service has no environment: block. Gremlin now creates one before volumes: automatically. If it still happens, add an environment: block manually with at least one entry.
Ollama keeps blocking on legitimate config
Add gremlin.context explaining the situation. Ollama treats it as ground truth.
Auto-fix loop — same issues reappear after fix
Check label indentation — labels inside deploy.labels must be indented 8 spaces consistently.
Deploy skipped every time
Check gremlin.deploy in the stack labels and in gremlin/config.yaml. Global deploy: false overrides all stacks unless the stack explicitly sets gremlin.deploy: "true".
Service shows up as "netgrimoire" in checker errors
The file has a blank line between services: and the first service name — this was a known bug fixed in pipeline v2026-04-30.