docs: create Netgrimoire/Gremlin-Grimoire/CICD_UserGuide
This commit is contained in:
parent
5d0f756e66
commit
8a6320e9ca
1 changed files with 407 additions and 0 deletions
407
Netgrimoire/Gremlin-Grimoire/CICD_UserGuide.md
Normal file
407
Netgrimoire/Gremlin-Grimoire/CICD_UserGuide.md
Normal file
|
|
@ -0,0 +1,407 @@
|
|||
---
|
||||
title: Gremlin CI/CD User Guide
|
||||
description:
|
||||
published: true
|
||||
date: 2026-04-28T20:56:45.863Z
|
||||
tags:
|
||||
editor: markdown
|
||||
dateCreated: 2026-04-28T20:56:45.863Z
|
||||
---
|
||||
|
||||
# Gremlin CI/CD — Operator Guide
|
||||
|
||||
> **NetGrimoire Infrastructure Reference**
|
||||
> How to write, structure, and manage Swarm stacks for the Gremlin CI/CD pipeline.
|
||||
> For pipeline architecture, see [Gremlin CI/CD Pipeline](gremlin-cicd-wiki.md).
|
||||
|
||||
---
|
||||
|
||||
## How It Works
|
||||
|
||||
Push any `.yml` or `.yaml` file under `swarm/` to `traveler/services` and Gremlin takes over:
|
||||
|
||||
1. Fetches the file and classifies it (Swarm, Pocket, or plain Compose)
|
||||
2. Runs all schema checkers
|
||||
3. If issues found and all are fixable — auto-fixes and recommits
|
||||
4. If issues found and unfixable — sends ntfy alert, stops
|
||||
5. If all checks pass — runs Ollama audit, then deploys
|
||||
6. After deploy — updates Gatus monitoring config
|
||||
|
||||
You get ntfy notifications at every stage. A clean push produces one notification: ✅ Deploy Complete.
|
||||
|
||||
---
|
||||
|
||||
## Required Stack Structure
|
||||
|
||||
Every Swarm service must have these elements. Missing any will block deployment.
|
||||
|
||||
```yaml
|
||||
services:
|
||||
myservice:
|
||||
image: vendor/image:tag
|
||||
environment:
|
||||
PUID: "1964"
|
||||
PGID: "1964"
|
||||
TZ: America/Chicago
|
||||
volumes:
|
||||
- /DockerVol/myservice:/data # pinned — requires node.hostname
|
||||
# or
|
||||
- /data/nfs/znas/Docker/myservice:/data # floating — no hostname needed
|
||||
networks:
|
||||
- netgrimoire
|
||||
deploy:
|
||||
restart_policy:
|
||||
condition: any
|
||||
delay: 5s
|
||||
max_attempts: 3
|
||||
window: 120s
|
||||
placement:
|
||||
constraints:
|
||||
- node.platform.arch != aarch64
|
||||
- node.platform.arch != arm
|
||||
- node.hostname == znas # required when using /DockerVol/
|
||||
labels:
|
||||
caddy: myservice.netgrimoire.com
|
||||
caddy.reverse_proxy: myservice:8080
|
||||
caddy.import_1: crowdsec
|
||||
caddy.import_2: authentik
|
||||
|
||||
monitor.name: MyService
|
||||
monitor.url: https://myservice.netgrimoire.com
|
||||
|
||||
homepage.group: NetGrimoire
|
||||
homepage.name: MyService
|
||||
homepage.icon: myservice.png
|
||||
homepage.href: https://myservice.netgrimoire.com
|
||||
homepage.description: My service description
|
||||
|
||||
diun.enable: "true"
|
||||
|
||||
networks:
|
||||
netgrimoire:
|
||||
external: true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Volume Path Rules
|
||||
|
||||
| Path type | Example | Placement constraint |
|
||||
|---|---|---|
|
||||
| `/DockerVol/` | `/DockerVol/myservice:/data` | `node.hostname` **required** |
|
||||
| `/data/nfs/znas/` | `/data/nfs/znas/Docker/myservice:/data` | `node.hostname` **forbidden** |
|
||||
|
||||
Valid hostnames for `node.hostname`: `docker3`, `docker4`, `docker5`, `znas`, `dockerpi1`
|
||||
|
||||
---
|
||||
|
||||
## Identity Rules
|
||||
|
||||
**Method 1** — LinuxServer.io and homelab images (preferred):
|
||||
```yaml
|
||||
environment:
|
||||
PUID: "1964"
|
||||
PGID: "1964"
|
||||
```
|
||||
|
||||
**Method 2** — Official Docker Hub images:
|
||||
```yaml
|
||||
user: "1964:1964"
|
||||
```
|
||||
|
||||
**Exemption** — Images that manage their own users (Authentik, MailCow):
|
||||
```yaml
|
||||
labels:
|
||||
gremlin.uid.exempt: "true"
|
||||
gremlin.uid.reason: "Authentik manages its own internal user context"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Caddy Label Rules
|
||||
|
||||
```yaml
|
||||
caddy: myservice.netgrimoire.com # hostname only — no https:// prefix
|
||||
caddy.reverse_proxy: myservice:8080 # service name and port — no IP addresses
|
||||
caddy.import_1: crowdsec # mandatory
|
||||
caddy.import_2: authentik # mandatory
|
||||
```
|
||||
|
||||
Services without a public URL (internal sidecars, databases):
|
||||
```yaml
|
||||
gremlin.caddy.skip: "true"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Monitor Labels
|
||||
|
||||
Gremlin writes monitor endpoints to Gatus after each successful deploy.
|
||||
|
||||
```yaml
|
||||
monitor.name: MyService # display name in Gatus
|
||||
monitor.url: https://myservice.netgrimoire.com
|
||||
monitor.type: http # optional: http | tcp | ping | dns (default: http)
|
||||
monitor.interval: "60" # optional: seconds, minimum 20 (default: 60)
|
||||
```
|
||||
|
||||
Services that should not be monitored:
|
||||
```yaml
|
||||
gremlin.monitor.skip: "true"
|
||||
```
|
||||
|
||||
TCP example (for non-HTTP services):
|
||||
```yaml
|
||||
monitor.type: tcp
|
||||
monitor.url: myservice:5432
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Homepage Labels
|
||||
|
||||
```yaml
|
||||
homepage.group: Media # dashboard group
|
||||
homepage.name: MyService # display name
|
||||
homepage.icon: myservice.png # icon filename
|
||||
homepage.href: https://myservice.netgrimoire.com
|
||||
homepage.description: Brief description
|
||||
```
|
||||
|
||||
Services that should not appear on Homepage:
|
||||
```yaml
|
||||
gremlin.homepage.skip: "true"
|
||||
```
|
||||
|
||||
> **Auto-fix note:** If homepage labels are missing, Gremlin derives them from the caddy: label and service name. Group defaults to "New", icon defaults to "servicename.png". Review and correct after auto-fix.
|
||||
|
||||
---
|
||||
|
||||
## Gremlin Directives
|
||||
|
||||
All directives go inside `deploy.labels`. All are opt-out — a stack with no `gremlin.*` labels gets full treatment.
|
||||
|
||||
### Pipeline Control
|
||||
|
||||
```yaml
|
||||
gremlin.enable: "true"
|
||||
# Set false to have Gremlin ignore this file entirely on push.
|
||||
# Default: true
|
||||
|
||||
gremlin.checks: "all"
|
||||
# Comma-separated checker IDs to run, or "all".
|
||||
# Example: "swarm-syntax,identity,caddy"
|
||||
# Default: all
|
||||
|
||||
gremlin.checks.skip: ""
|
||||
# Comma-separated checker IDs to skip.
|
||||
# Example: "homepage,monitor"
|
||||
# Default: (none)
|
||||
```
|
||||
|
||||
### Auto-fix Control
|
||||
|
||||
```yaml
|
||||
gremlin.autofix: "true"
|
||||
# Set false to disable all auto-fixing.
|
||||
# Default: true
|
||||
|
||||
gremlin.autofix.skip: "false"
|
||||
# Set true to notify but never attempt to fix.
|
||||
# Default: false
|
||||
|
||||
gremlin.autofix.skip_fields: ""
|
||||
# Comma-separated fields to skip during fix.
|
||||
# Example: "hostname,uid"
|
||||
# Default: (none)
|
||||
```
|
||||
|
||||
### Deploy Control
|
||||
|
||||
```yaml
|
||||
gremlin.deploy: "true"
|
||||
# Set false to run checks and fixes but never deploy.
|
||||
# Use for test stacks or stacks managed manually.
|
||||
# Default: true
|
||||
|
||||
gremlin.deploy.strategy: "stack"
|
||||
# Deployment method. Values: stack | helm | kubectl
|
||||
# Default: stack
|
||||
```
|
||||
|
||||
### Identity Exemptions
|
||||
|
||||
```yaml
|
||||
gremlin.uid.exempt: "false"
|
||||
# Set true to skip PUID/PGID/user checks.
|
||||
# Use for images that manage their own users.
|
||||
# Default: false
|
||||
|
||||
gremlin.uid.reason: ""
|
||||
# Documents why uid.exempt is set.
|
||||
# Required when uid.exempt is true.
|
||||
```
|
||||
|
||||
### Placement Control
|
||||
|
||||
```yaml
|
||||
gremlin.arm.allow: "false"
|
||||
# Set true to allow ARM/Pi deployment.
|
||||
# Removes ARM exclusion constraints.
|
||||
# Default: false
|
||||
```
|
||||
|
||||
### Service-level Skip Labels
|
||||
|
||||
```yaml
|
||||
gremlin.caddy.skip: "false" # skip Caddy label validation
|
||||
gremlin.homepage.skip: "false" # skip Homepage label validation
|
||||
gremlin.monitor.skip: "false" # skip monitor label validation
|
||||
gremlin.diun.skip: "false" # skip Diun label validation
|
||||
gremlin.network.skip: "false" # skip network validation (whole stack)
|
||||
```
|
||||
|
||||
### Ollama Context
|
||||
|
||||
```yaml
|
||||
gremlin.context: ""
|
||||
# Free text passed to Ollama audit as ground truth.
|
||||
# Ollama will not flag anything the context explains.
|
||||
# Example: "OIDC_CLIENT_SECRET in plain text is intentional — no secrets manager in use"
|
||||
```
|
||||
|
||||
### Notification Control
|
||||
|
||||
```yaml
|
||||
gremlin.notify: "true" # false = suppress all ntfy for this stack
|
||||
gremlin.notify.level: "all" # all | failures | none
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Checker IDs
|
||||
|
||||
Use these IDs with `gremlin.checks` and `gremlin.checks.skip`:
|
||||
|
||||
| ID | What it checks |
|
||||
|---|---|
|
||||
| `swarm-syntax` | Forbidden fields: version, container_name, hostname, restart, depends_on, dnsrr |
|
||||
| `identity` | PUID/PGID 1964 or user: "1964:1964" |
|
||||
| `network` | netgrimoire overlay network |
|
||||
| `placement` | ARM exclusions, DockerVol/hostname rules, restart_policy |
|
||||
| `caddy` | caddy: label, reverse_proxy, import_1/import_2 |
|
||||
| `homepage` | group, name, icon, href, description |
|
||||
| `monitor` | monitor.name, monitor.url, optional type/interval |
|
||||
| `legacy-labels` | Flags kuma.* labels for removal |
|
||||
| `diun` | diun.enable: "true" |
|
||||
|
||||
---
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Internal sidecar (database, cache)
|
||||
|
||||
```yaml
|
||||
postgres:
|
||||
image: postgres:15
|
||||
environment:
|
||||
POSTGRES_USER: myapp
|
||||
POSTGRES_PASSWORD: secret
|
||||
volumes:
|
||||
- /DockerVol/myapp/postgres:/var/lib/postgresql/data
|
||||
networks:
|
||||
- netgrimoire
|
||||
deploy:
|
||||
restart_policy:
|
||||
condition: any
|
||||
delay: 5s
|
||||
max_attempts: 3
|
||||
window: 120s
|
||||
placement:
|
||||
constraints:
|
||||
- node.platform.arch != aarch64
|
||||
- node.platform.arch != arm
|
||||
- node.hostname == docker4
|
||||
labels:
|
||||
gremlin.caddy.skip: "true"
|
||||
gremlin.homepage.skip: "true"
|
||||
gremlin.monitor.skip: "true"
|
||||
diun.enable: "true"
|
||||
```
|
||||
|
||||
### Test stack (never deployed)
|
||||
|
||||
```yaml
|
||||
labels:
|
||||
gremlin.deploy: "false"
|
||||
# ... other labels
|
||||
```
|
||||
|
||||
### ARM/Pi service
|
||||
|
||||
```yaml
|
||||
labels:
|
||||
gremlin.arm.allow: "true"
|
||||
# ... other labels
|
||||
placement:
|
||||
constraints:
|
||||
- node.hostname == dockerpi1
|
||||
```
|
||||
|
||||
### Image requiring root
|
||||
|
||||
```yaml
|
||||
labels:
|
||||
gremlin.uid.exempt: "true"
|
||||
gremlin.uid.reason: "Image requires root — does not support PUID/PGID"
|
||||
# ... other labels
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Forbidden Fields
|
||||
|
||||
These fields are automatically removed by Gremlin:
|
||||
|
||||
| Field | Reason |
|
||||
|---|---|
|
||||
| `version:` (top-level) | Obsolete in Compose v3 |
|
||||
| `container_name:` | Conflicts with Swarm service naming |
|
||||
| `hostname:` (service-level) | Conflicts with Swarm DNS |
|
||||
| `restart:` (service-level) | Use `deploy.restart_policy` instead |
|
||||
| `depends_on:` | Not supported in Swarm mode |
|
||||
| `links:` | Not supported in Swarm mode |
|
||||
|
||||
These fields cause an **unfixable** block:
|
||||
|
||||
| Field | Reason |
|
||||
|---|---|
|
||||
| `endpoint_mode: dnsrr` | Breaks internal DNS resolution |
|
||||
| Missing `deploy:` block | File treated as plain Compose, not Swarm |
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**"Missing deploy: block" — file skipped as non-Swarm**
|
||||
Your compose file has no `deploy:` section. Add a `deploy:` block to each service for Swarm compatibility.
|
||||
|
||||
**"uses /DockerVol/ but has no node.hostname constraint" — unfixable**
|
||||
Add a `node.hostname` constraint to your `deploy.placement.constraints`. Gremlin cannot guess which node to pin it to.
|
||||
|
||||
**Ollama keeps blocking on legitimate config**
|
||||
Add `gremlin.context` to explain the situation. Ollama treats context as ground truth and will not flag it.
|
||||
|
||||
**Auto-fix loop — fixes applied but same issues keep appearing**
|
||||
The fixer is finding the labels but the checker isn't recognizing them after insertion. Check label indentation — labels inside `deploy.labels` must be indented 8 spaces.
|
||||
|
||||
**Deploy skipped every time**
|
||||
Check `gremlin.deploy` — if set to `"false"` the pipeline validates and fixes but never deploys.
|
||||
|
||||
---
|
||||
|
||||
## Related
|
||||
|
||||
- [Gremlin CI/CD Pipeline](gremlin-cicd-wiki.md)
|
||||
- [NetGrimoire Stack Standards](stack-standards.md)
|
||||
- [Gatus](gatus.md)
|
||||
Loading…
Add table
Add a link
Reference in a new issue