Netgrimoire/Netgrimoire/Gremlin-Grimoire/CICD_UserGuide.md

13 KiB

title description published date tags editor dateCreated
Gremlin CI/CD User Guide true 2026-04-30T18:33:09.881Z markdown 2026-04-28T20:56:45.863Z

Gremlin CI/CD — Operator Guide

NetGrimoire Infrastructure Reference How to write, structure, and manage Swarm stacks for the Gremlin CI/CD pipeline. For pipeline architecture, see Gremlin CI/CD Pipeline.


How It Works

Push any .yml or .yaml file under swarm/ to traveler/services and Gremlin takes over:

  1. Fetches the file and classifies it (Swarm, Pocket, or plain Compose)
  2. Runs all schema checkers
  3. If issues found and all are fixable — auto-fixes and recommits
  4. If issues found and unfixable — sends ntfy alert, stops
  5. If all checks pass — runs Ollama audit, then deploys
  6. After deploy — updates Gatus monitoring config

You get ntfy notifications at every stage. A clean push produces one notification: Deploy Complete.


Required Stack Structure

Every Swarm service must have these elements. Missing any will block deployment.

services:
  myservice:
    image: vendor/image:tag
    environment:
      PUID: "1964"
      PGID: "1964"
      TZ: America/Chicago
    volumes:
      - /DockerVol/myservice:/data        # pinned — requires node.hostname
      # or
      - /data/nfs/znas/Docker/myservice:/data  # floating — no hostname needed
    networks:
      - netgrimoire
    deploy:
      restart_policy:
        condition: any
        delay: 5s
        max_attempts: 3
        window: 120s
      placement:
        constraints:
          - node.platform.arch != aarch64
          - node.platform.arch != arm
          - node.hostname == znas          # required when using /DockerVol/
      labels:
        caddy: myservice.netgrimoire.com
        caddy.reverse_proxy: myservice:8080
        caddy.import_1: crowdsec
        caddy.import_2: authentik

        monitor.name: MyService
        monitor.url: http://myservice:8080    # internal URL preferred

        homepage.group: NetGrimoire
        homepage.name: MyService
        homepage.icon: myservice.png
        homepage.href: https://myservice.netgrimoire.com
        homepage.description: My service description

        diun.enable: "true"

networks:
  netgrimoire:
    external: true

Volume Path Rules

Path type Example Placement constraint
/DockerVol/ /DockerVol/myservice:/data node.hostname required
/data/nfs/znas/ /data/nfs/znas/Docker/myservice:/data node.hostname not required

Valid hostnames for node.hostname: docker3, docker4, docker5, znas, dockerpi1


Identity Rules

Method 1 — LinuxServer.io and homelab images (preferred):

environment:
  PUID: "1964"
  PGID: "1964"

Method 2 — Official Docker Hub images:

user: "1964:1964"

Exemption — Images that manage their own users (Authentik, MailCow, Postgres, Redis):

labels:
  gremlin.uid.exempt: "true"
  gremlin.uid.reason: "Postgres manages its own user — requires UID 999"

When uid.exempt is set, Prepare Volumes will mkdir the service's volume paths but will not chown them. The image manages its own ownership.


Caddy Label Rules

caddy: myservice.netgrimoire.com      # hostname only — no https:// prefix
caddy.reverse_proxy: myservice:8080   # service name and port — no IP addresses
caddy.import_1: crowdsec              # always required
caddy.import_2: authentik             # required unless gremlin.authentik.skip is set

Services without a public URL (internal sidecars, databases):

gremlin.caddy.skip: "true"

Services that should bypass Authentik but still go through CrowdSec:

gremlin.authentik.skip: "true"

Monitor Labels

Gremlin writes monitor endpoints to Gatus after each successful deploy. Monitor URLs should use the internal service name and port so Gatus checks the container directly without depending on Caddy or Authentik being up.

monitor.name: MyService               # display name in Gatus
monitor.url: http://myservice:8080    # internal URL preferred
monitor.type: http                    # optional: http | tcp | ping | dns (default: http)
monitor.interval: "60"                # optional: seconds, minimum 20 (default: 60)

For non-HTTP services (mail, databases):

monitor.type: tcp
monitor.url: tcp://myservice:5432

Services that should not be monitored:

gremlin.monitor.skip: "true"

Gatus determines the check condition from the URL scheme:

  • http:// or https://[STATUS] == 200
  • tcp:// or type: tcp[CONNECTED] == true
  • type: ping[CONNECTED] == true

Homepage Labels

homepage.group: Media                 # dashboard group
homepage.name: MyService              # display name
homepage.icon: myservice.png          # icon filename
homepage.href: https://myservice.netgrimoire.com
homepage.description: Brief description

Services that should not appear on Homepage:

gremlin.homepage.skip: "true"

Auto-fix note: If homepage labels are missing, Gremlin derives them from the caddy: label and service name. Group defaults to "New", icon defaults to "servicename.png". Review and correct after auto-fix.


Gremlin Directives Reference

All directives go inside deploy.labels. All are opt-out — a stack with no gremlin.* labels gets full treatment.

Pipeline Control

Directive Default Description
gremlin.enable true Set false to have Gremlin ignore this file entirely on push
gremlin.checks all Comma-separated checker IDs to run, or all
gremlin.checks.skip (none) Comma-separated checker IDs to skip
gremlin.version (auto) Stamped automatically — do not set manually
gremlin.context (none) Free text passed to Ollama as ground truth — Ollama will not flag anything this explains

Auto-fix Control

Directive Default Description
gremlin.autofix true Set false to disable all auto-fixing
gremlin.autofix.skip false Set true to notify but never attempt to fix
gremlin.autofix.skip_fields (none) Comma-separated fields to skip during fix (e.g. uid,hostname)

Deploy Control

Directive Default Description
gremlin.deploy true Set false to run checks and fixes but never deploy
gremlin.deploy.strategy stack Deployment method — currently only stack is implemented
gremlin.port (none) Internal container port when no ports: mapping exists — used to derive caddy.reverse_proxy and monitor.url

Identity

Directive Default Description
gremlin.uid.exempt false Skip PUID/PGID/user checks and skip chown on volumes for this service
gremlin.uid.reason (none) Documents why uid.exempt is set — include with every exemption

Placement

Directive Default Description
gremlin.arm.allow false Allow ARM/Pi deployment — removes ARM exclusion constraint requirement

Caddy

Directive Default Description
gremlin.caddy.skip false Skip all Caddy label checks for this service
gremlin.authentik.skip false Skip caddy.import_2: authentik requirement only — CrowdSec still required

Homepage

Directive Default Description
gremlin.homepage.skip false Skip Homepage label checks for this service

Monitor

Directive Default Description
gremlin.monitor.skip false Skip monitor label checks for this service

Network

Directive Default Description
gremlin.network.skip false Skip netgrimoire network checks for this service

Diun

Directive Default Description
gremlin.diun.skip false Skip diun.enable check for this service

Notifications

Directive Default Description
gremlin.notify true Set false to suppress all ntfy notifications for this stack
gremlin.notify.level all all | failures | none

Checker IDs

Use these IDs with gremlin.checks and gremlin.checks.skip:

ID What it checks
swarm-syntax Forbidden fields: version, container_name, hostname, restart, depends_on, dnsrr
identity PUID/PGID 1964 or user: "1964:1964"
network netgrimoire overlay network attached
placement ARM exclusions, DockerVol/hostname rules, restart_policy
caddy caddy: label, reverse_proxy format, import_1/import_2
homepage group, name, icon, href, description
monitor monitor.name, monitor.url, optional type/interval
legacy-labels Flags kuma.* labels for removal
version gremlin.version stamp matches current config version
diun diun.enable: "true" present

Common Patterns

Internal sidecar (database, cache)

  postgres:
    image: postgres:15
    environment:
      POSTGRES_USER: myapp
      POSTGRES_PASSWORD: secret
    volumes:
      - /DockerVol/myapp/postgres:/var/lib/postgresql/data
    networks:
      - netgrimoire
    deploy:
      restart_policy:
        condition: any
        delay: 5s
        max_attempts: 3
        window: 120s
      placement:
        constraints:
          - node.platform.arch != aarch64
          - node.platform.arch != arm
          - node.hostname == docker4
      labels:
        gremlin.uid.exempt: "true"
        gremlin.uid.reason: "Postgres requires UID 999"
        gremlin.caddy.skip: "true"
        gremlin.homepage.skip: "true"
        gremlin.monitor.skip: "true"
        diun.enable: "true"

Service without Authentik (remote browser, public endpoint)

      labels:
        caddy: firefox.netgrimoire.com
        caddy.reverse_proxy: firefox:5800
        caddy.import_1: crowdsec
        gremlin.authentik.skip: "true"
        # ... other labels

Service with no web UI and no public port

      labels:
        gremlin.caddy.skip: "true"
        gremlin.homepage.skip: "true"
        gremlin.monitor.skip: "true"
        diun.enable: "true"

Test stack (never deployed)

      labels:
        gremlin.deploy: "false"
        # ... other labels

ARM/Pi service

      labels:
        gremlin.arm.allow: "true"
        # ... other labels
      placement:
        constraints:
          - node.hostname == dockerpi1

Service with no ports: mapping

      labels:
        gremlin.port: "8080"
        # tells Gremlin the internal port for caddy and monitor derivation
        # ... other labels

Ollama false positive suppression

      labels:
        gremlin.context: "shm_size is set to 1gb — required for this browser application"
        # ... other labels

Forbidden Fields

These fields are automatically removed by Gremlin:

Field Reason
version: (top-level) Obsolete in Compose v3
container_name: Conflicts with Swarm service naming
hostname: (service-level) Conflicts with Swarm DNS
restart: (service-level) Use deploy.restart_policy instead
depends_on: Not supported in Swarm mode

These fields cause an unfixable block — Gremlin cannot fix them automatically:

Field Reason
endpoint_mode: dnsrr Breaks internal DNS resolution — VIP mode required
Missing deploy: block File treated as plain Compose, not Swarm
/DockerVol/ without node.hostname Gremlin cannot guess the target node

Troubleshooting

"Missing deploy: block" — file skipped as non-Swarm Your compose file has no deploy: section. Add a deploy: block to each service.

"uses /DockerVol/ but has no node.hostname constraint" — unfixable Add a node.hostname constraint to deploy.placement.constraints. Gremlin cannot guess which node to pin it to.

PUID/PGID landing under volumes: Your service has no environment: block. Gremlin now creates one before volumes: automatically. If it still happens, add an environment: block manually with at least one entry.

Ollama keeps blocking on legitimate config Add gremlin.context explaining the situation. Ollama treats it as ground truth.

Auto-fix loop — same issues reappear after fix Check label indentation — labels inside deploy.labels must be indented 8 spaces consistently.

Deploy skipped every time Check gremlin.deploy in the stack labels and in gremlin/config.yaml. Global deploy: false overrides all stacks unless the stack explicitly sets gremlin.deploy: "true".

Service shows up as "netgrimoire" in checker errors The file has a blank line between services: and the first service name — this was a known bug fixed in pipeline v2026-04-30.