Netgrimoire/Netgrimoire/Gremlin-Grimoire/CICD_UserGuide.md

10 KiB

title description published date tags editor dateCreated
Gremlin CI/CD User Guide true 2026-04-28T20:56:45.863Z markdown 2026-04-28T20:56:45.863Z

Gremlin CI/CD — Operator Guide

NetGrimoire Infrastructure Reference How to write, structure, and manage Swarm stacks for the Gremlin CI/CD pipeline. For pipeline architecture, see Gremlin CI/CD Pipeline.


How It Works

Push any .yml or .yaml file under swarm/ to traveler/services and Gremlin takes over:

  1. Fetches the file and classifies it (Swarm, Pocket, or plain Compose)
  2. Runs all schema checkers
  3. If issues found and all are fixable — auto-fixes and recommits
  4. If issues found and unfixable — sends ntfy alert, stops
  5. If all checks pass — runs Ollama audit, then deploys
  6. After deploy — updates Gatus monitoring config

You get ntfy notifications at every stage. A clean push produces one notification: Deploy Complete.


Required Stack Structure

Every Swarm service must have these elements. Missing any will block deployment.

services:
  myservice:
    image: vendor/image:tag
    environment:
      PUID: "1964"
      PGID: "1964"
      TZ: America/Chicago
    volumes:
      - /DockerVol/myservice:/data        # pinned — requires node.hostname
      # or
      - /data/nfs/znas/Docker/myservice:/data  # floating — no hostname needed
    networks:
      - netgrimoire
    deploy:
      restart_policy:
        condition: any
        delay: 5s
        max_attempts: 3
        window: 120s
      placement:
        constraints:
          - node.platform.arch != aarch64
          - node.platform.arch != arm
          - node.hostname == znas          # required when using /DockerVol/
      labels:
        caddy: myservice.netgrimoire.com
        caddy.reverse_proxy: myservice:8080
        caddy.import_1: crowdsec
        caddy.import_2: authentik

        monitor.name: MyService
        monitor.url: https://myservice.netgrimoire.com

        homepage.group: NetGrimoire
        homepage.name: MyService
        homepage.icon: myservice.png
        homepage.href: https://myservice.netgrimoire.com
        homepage.description: My service description

        diun.enable: "true"

networks:
  netgrimoire:
    external: true

Volume Path Rules

Path type Example Placement constraint
/DockerVol/ /DockerVol/myservice:/data node.hostname required
/data/nfs/znas/ /data/nfs/znas/Docker/myservice:/data node.hostname forbidden

Valid hostnames for node.hostname: docker3, docker4, docker5, znas, dockerpi1


Identity Rules

Method 1 — LinuxServer.io and homelab images (preferred):

environment:
  PUID: "1964"
  PGID: "1964"

Method 2 — Official Docker Hub images:

user: "1964:1964"

Exemption — Images that manage their own users (Authentik, MailCow):

labels:
  gremlin.uid.exempt: "true"
  gremlin.uid.reason: "Authentik manages its own internal user context"

Caddy Label Rules

caddy: myservice.netgrimoire.com      # hostname only — no https:// prefix
caddy.reverse_proxy: myservice:8080   # service name and port — no IP addresses
caddy.import_1: crowdsec              # mandatory
caddy.import_2: authentik             # mandatory

Services without a public URL (internal sidecars, databases):

gremlin.caddy.skip: "true"

Monitor Labels

Gremlin writes monitor endpoints to Gatus after each successful deploy.

monitor.name: MyService               # display name in Gatus
monitor.url: https://myservice.netgrimoire.com
monitor.type: http                    # optional: http | tcp | ping | dns (default: http)
monitor.interval: "60"                # optional: seconds, minimum 20 (default: 60)

Services that should not be monitored:

gremlin.monitor.skip: "true"

TCP example (for non-HTTP services):

monitor.type: tcp
monitor.url: myservice:5432

Homepage Labels

homepage.group: Media                 # dashboard group
homepage.name: MyService              # display name
homepage.icon: myservice.png          # icon filename
homepage.href: https://myservice.netgrimoire.com
homepage.description: Brief description

Services that should not appear on Homepage:

gremlin.homepage.skip: "true"

Auto-fix note: If homepage labels are missing, Gremlin derives them from the caddy: label and service name. Group defaults to "New", icon defaults to "servicename.png". Review and correct after auto-fix.


Gremlin Directives

All directives go inside deploy.labels. All are opt-out — a stack with no gremlin.* labels gets full treatment.

Pipeline Control

gremlin.enable: "true"
# Set false to have Gremlin ignore this file entirely on push.
# Default: true

gremlin.checks: "all"
# Comma-separated checker IDs to run, or "all".
# Example: "swarm-syntax,identity,caddy"
# Default: all

gremlin.checks.skip: ""
# Comma-separated checker IDs to skip.
# Example: "homepage,monitor"
# Default: (none)

Auto-fix Control

gremlin.autofix: "true"
# Set false to disable all auto-fixing.
# Default: true

gremlin.autofix.skip: "false"
# Set true to notify but never attempt to fix.
# Default: false

gremlin.autofix.skip_fields: ""
# Comma-separated fields to skip during fix.
# Example: "hostname,uid"
# Default: (none)

Deploy Control

gremlin.deploy: "true"
# Set false to run checks and fixes but never deploy.
# Use for test stacks or stacks managed manually.
# Default: true

gremlin.deploy.strategy: "stack"
# Deployment method. Values: stack | helm | kubectl
# Default: stack

Identity Exemptions

gremlin.uid.exempt: "false"
# Set true to skip PUID/PGID/user checks.
# Use for images that manage their own users.
# Default: false

gremlin.uid.reason: ""
# Documents why uid.exempt is set.
# Required when uid.exempt is true.

Placement Control

gremlin.arm.allow: "false"
# Set true to allow ARM/Pi deployment.
# Removes ARM exclusion constraints.
# Default: false

Service-level Skip Labels

gremlin.caddy.skip: "false"      # skip Caddy label validation
gremlin.homepage.skip: "false"   # skip Homepage label validation
gremlin.monitor.skip: "false"    # skip monitor label validation
gremlin.diun.skip: "false"       # skip Diun label validation
gremlin.network.skip: "false"    # skip network validation (whole stack)

Ollama Context

gremlin.context: ""
# Free text passed to Ollama audit as ground truth.
# Ollama will not flag anything the context explains.
# Example: "OIDC_CLIENT_SECRET in plain text is intentional — no secrets manager in use"

Notification Control

gremlin.notify: "true"           # false = suppress all ntfy for this stack
gremlin.notify.level: "all"      # all | failures | none

Checker IDs

Use these IDs with gremlin.checks and gremlin.checks.skip:

ID What it checks
swarm-syntax Forbidden fields: version, container_name, hostname, restart, depends_on, dnsrr
identity PUID/PGID 1964 or user: "1964:1964"
network netgrimoire overlay network
placement ARM exclusions, DockerVol/hostname rules, restart_policy
caddy caddy: label, reverse_proxy, import_1/import_2
homepage group, name, icon, href, description
monitor monitor.name, monitor.url, optional type/interval
legacy-labels Flags kuma.* labels for removal
diun diun.enable: "true"

Common Patterns

Internal sidecar (database, cache)

  postgres:
    image: postgres:15
    environment:
      POSTGRES_USER: myapp
      POSTGRES_PASSWORD: secret
    volumes:
      - /DockerVol/myapp/postgres:/var/lib/postgresql/data
    networks:
      - netgrimoire
    deploy:
      restart_policy:
        condition: any
        delay: 5s
        max_attempts: 3
        window: 120s
      placement:
        constraints:
          - node.platform.arch != aarch64
          - node.platform.arch != arm
          - node.hostname == docker4
      labels:
        gremlin.caddy.skip: "true"
        gremlin.homepage.skip: "true"
        gremlin.monitor.skip: "true"
        diun.enable: "true"

Test stack (never deployed)

      labels:
        gremlin.deploy: "false"
        # ... other labels

ARM/Pi service

      labels:
        gremlin.arm.allow: "true"
        # ... other labels
      placement:
        constraints:
          - node.hostname == dockerpi1

Image requiring root

      labels:
        gremlin.uid.exempt: "true"
        gremlin.uid.reason: "Image requires root — does not support PUID/PGID"
        # ... other labels

Forbidden Fields

These fields are automatically removed by Gremlin:

Field Reason
version: (top-level) Obsolete in Compose v3
container_name: Conflicts with Swarm service naming
hostname: (service-level) Conflicts with Swarm DNS
restart: (service-level) Use deploy.restart_policy instead
depends_on: Not supported in Swarm mode
links: Not supported in Swarm mode

These fields cause an unfixable block:

Field Reason
endpoint_mode: dnsrr Breaks internal DNS resolution
Missing deploy: block File treated as plain Compose, not Swarm

Troubleshooting

"Missing deploy: block" — file skipped as non-Swarm Your compose file has no deploy: section. Add a deploy: block to each service for Swarm compatibility.

"uses /DockerVol/ but has no node.hostname constraint" — unfixable Add a node.hostname constraint to your deploy.placement.constraints. Gremlin cannot guess which node to pin it to.

Ollama keeps blocking on legitimate config Add gremlin.context to explain the situation. Ollama treats context as ground truth and will not flag it.

Auto-fix loop — fixes applied but same issues keep appearing The fixer is finding the labels but the checker isn't recognizing them after insertion. Check label indentation — labels inside deploy.labels must be indented 8 spaces.

Deploy skipped every time Check gremlin.deploy — if set to "false" the pipeline validates and fixes but never deploys.