This commit is contained in:
traveler 2026-04-12 15:51:37 -05:00
parent b1a2672c76
commit e55070398b
65 changed files with 0 additions and 0 deletions

View file

@ -0,0 +1,453 @@
---
title: Stashapp Workflow
description:
published: true
date: 2026-02-20T04:25:56.467Z
tags:
editor: markdown
dateCreated: 2026-02-18T13:08:53.604Z
---
# StashApp: Automated Library Management with Community Scrapers
> **Goal:** Automatically identify, tag, rename, and organize your media library with minimal manual intervention using StashDB, ThePornDB, and the CommunityScrapers repository.
---
## Table of Contents
1. [Prerequisites](#1-prerequisites)
2. [Installing CommunityScrapers](#2-installing-community-scrapers)
3. [Configuring Metadata Providers](#3-configuring-metadata-providers)
- [StashDB](#31-stashdb)
- [ThePornDB (TPDB)](#32-theporndbtpdb)
4. [Configuring Your Library](#4-configuring-your-library)
5. [Automated File Naming & Moving](#5-automated-file-naming--moving)
6. [The Core Workflow](#6-the-core-workflow)
7. [Handling ABMEA & Amateur Content](#7-handling-abmea--amateur-content)
8. [Automation with Scheduled Tasks](#8-automation-with-scheduled-tasks)
9. [Tips & Troubleshooting](#9-tips--troubleshooting)
---
## 1. Prerequisites
Before starting, make sure you have:
- **StashApp installed and running** — see the [official install docs](https://github.com/stashapp/stash/wiki/Installation)
- **Git installed** on your system (needed to clone the scrapers repo)
- **A ThePornDB account** — free tier available at [metadataapi.net](https://metadataapi.net)
- **A StashDB account** — requires a community invite; request one on [the Discord](https://discord.gg/2TsNFKt)
- Your Stash config directory noted — default locations:
| OS | Default Path |
|----|-------------|
| Windows | `%APPDATA%\stash` |
| macOS | `~/.stash` |
| Linux | `~/.stash` |
| Docker | `/root/.stash` |
---
## 2. Installing CommunityScrapers
The [CommunityScrapers](https://github.com/stashapp/CommunityScrapers) repository contains scrapers for hundreds of sites maintained by the Stash community. This is the primary source for site-specific scrapers including ABMEA.
### Step 1 — Navigate to your Stash config directory
```bash
cd ~/.stash
```
### Step 2 — Create a scrapers directory if it doesn't exist
```bash
mkdir -p scrapers
cd scrapers
```
### Step 3 — Clone the CommunityScrapers repository
```bash
git clone https://github.com/stashapp/CommunityScrapers.git
```
This creates `~/.stash/scrapers/CommunityScrapers/` containing all available scrapers.
### Step 4 — Verify Stash detects the scrapers
1. Open Stash in your browser (default: `http://localhost:9999`)
2. Go to **Settings → Metadata Providers → Scrapers**
3. Click **Reload Scrapers**
4. You should now see a long list of scrapers including entries for ABMEA, ManyVids, Clips4Sale, etc.
### Step 5 — Keep scrapers updated
Since community scrapers are actively maintained, set up a periodic update:
```bash
cd ~/.stash/scrapers/CommunityScrapers
git pull
```
> 💡 **Tip:** You can automate this with a cron job or scheduled task. See [Section 8](#8-automation-with-scheduled-tasks).
### Installing Python Dependencies (if prompted)
Some scrapers require Python packages. If you see scraper errors mentioning missing modules:
```bash
pip install requests cloudscraper py-cord lxml
```
---
## 3. Configuring Metadata Providers
Stash uses **metadata providers** to automatically match scenes by fingerprint (phash/oshash). This is what enables true automation — no filename matching required.
### 3.1 StashDB
StashDB is the official community-run fingerprint and metadata database. It is the most reliable source for mainstream and studio content.
1. Go to **Settings → Metadata Providers**
2. Under **Stash-Box Endpoints**, click **Add**
3. Fill in:
- **Name:** `StashDB`
- **Endpoint:** `https://stashdb.org/graphql`
- **API Key:** *(generate this from your StashDB account → API Keys)*
4. Click **Confirm**
### 3.2 ThePornDB (TPDB)
TPDB aggregates metadata from a large number of sites and is especially useful for amateur, clip site, and ABMEA content that may not be on StashDB.
1. Log in at [metadataapi.net](https://metadataapi.net) and go to your **API Settings** to get your key
2. In Stash, go to **Settings → Metadata Providers**
3. Under **Stash-Box Endpoints**, click **Add**
4. Fill in:
- **Name:** `ThePornDB`
- **Endpoint:** `https://theporndb.net/graphql`
- **API Key:** *(your TPDB API key)*
5. Click **Confirm**
### Provider Priority Order
Set your identify task to query providers in this order for best results:
1. **StashDB** — highest quality, community-verified
2. **ThePornDB** — broad coverage including amateur/clip sites
3. **CommunityScrapers** (site-specific) — for anything not matched above
---
## 4. Configuring Your Library
### Adding Library Paths
1. Go to **Settings → Library**
2. Under **Directories**, click **Add** and point to your media folders
3. You can add multiple directories (e.g., separate drives or folders)
> ⚠️ **Do not** set your organized output folder as a source directory. Keep source and destination separate until you are confident in your setup.
### Recommended Directory Structure
```
/media/
├── stash-incoming/ ← Source: where new files land
└── stash-library/ ← Destination: where Stash moves organized files
├── Studios/
│ └── ABMEA/
└── Amateur/
```
---
## 5. Automated File Naming & Moving
This is the section that does the heavy lifting. Stash will rename and move files **only when a scene is marked as Organized**, which gives you a review gate before anything is touched.
### Enable File Moving
1. Go to **Settings → Library**
2. Enable **"Move files to organized folder on organize"**
3. Set your **Organized folder path** (e.g., `/media/stash-library`)
### Configure the File Naming Template
Still in **Settings → Library**, set your **Filename template**. These use Go template syntax with Stash variables.
**Recommended template for mixed studio/amateur libraries:**
```
{studio}/{date} {title}
```
**For performer-centric amateur libraries:**
```
{performers}/{studio}/{date} {title}
```
**Full example with fallbacks:**
```
{{if .Studio}}{{.Studio.Name}}{{else}}Unknown{{end}}/{{if .Date}}{{.Date}}{{else}}0000-00-00{{end}} {{.Title}}
```
### Available Template Variables
| Variable | Example Output |
|----------|---------------|
| `{title}` | `Scene Title Here` |
| `{date}` | `2024-03-15` |
| `{studio}` | `ABMEA` |
| `{performers}` | `Jane Doe` |
| `{resolution}` | `1080p` |
| `{duration}` | `00-32-15` |
| `{rating}` | `5` |
> 💡 If a field is empty (e.g., no studio), Stash skips that path segment. Test with a few scenes before running on your whole library.
---
## 6. The Core Workflow
Follow these steps **in order** every time you add new content. This is the automated pipeline.
```
New Files → Scan → Generate Fingerprints → Identify → Review → Organize (Move + Rename)
```
### Step 1 — Scan
**Tasks → Scan**
- Discovers new files and adds them to the database
- Does not move or rename anything yet
- Options to enable: **Generate covers on scan**
### Step 2 — Generate Fingerprints
**Tasks → Generate**
Select these options:
| Option | Purpose |
|--------|---------|
| ✅ **Phashes** | Used for fingerprint matching against StashDB/TPDB |
| ✅ **Checksums (MD5/SHA256)** | Used for duplicate detection |
| ✅ **Previews** | Thumbnail previews in the UI |
| ✅ **Sprites** | Timeline scrubber images |
> ⏳ This step is CPU/GPU intensive. Let it complete before proceeding. On a large library, this may take hours.
### Step 3 — Identify (Auto-Scrape by Fingerprint)
**Tasks → Identify**
This is the magic step. Stash sends your file fingerprints to StashDB and TPDB and pulls back metadata automatically.
Configure the task:
1. Click **Add Source** and add **StashDB** first
2. Click **Add Source** again and add **ThePornDB**
3. Under **Options**, enable:
- ✅ Set cover image
- ✅ Set performers
- ✅ Set studio
- ✅ Set tags
- ✅ Set date
4. Click **Identify**
Stash will now automatically match and populate metadata for any scene it recognizes by fingerprint.
### Step 4 — Auto Tag (Filename-Based Fallback)
For scenes that didn't match by fingerprint (common with amateur content), use Auto Tag to extract metadata from filenames.
**Tasks → Auto Tag**
- Matches **Performers**, **Studios**, and **Tags** from filenames against your existing database entries
- Works best when filenames contain names (e.g., `JaneDoe_SceneTitle_1080p.mp4`)
### Step 5 — Review Unmatched Scenes
Filter to find scenes that still need attention:
1. Go to **Scenes**
2. Filter by: **Organized = false** and **Studio = none** (or **Performers = none**)
3. Use the **Tagger view** (icon in top right of Scenes) for rapid URL-based scraping
In Tagger view:
- Paste the original source URL into the scrape field
- Click **Scrape** — Stash fills in all metadata from that URL
- Review and click **Save**
### Step 6 — Organize (Move & Rename)
Once you're satisfied with a scene's metadata:
1. Open the scene
2. Click the **Organize** button (checkmark icon), OR
3. Use **bulk organize**: select multiple scenes → Edit → Mark as Organized
When a scene is marked Organized, Stash will:
- ✅ Rename the file according to your template
- ✅ Move it to your organized folder
- ✅ Update the database path
> ⚠️ **This action cannot be easily undone at scale.** Always verify metadata on a small batch first.
---
## 7. Handling ABMEA & Amateur Content
ABMEA and amateur clips often lack fingerprint matches. Use these additional strategies:
### ABMEA-Specific Scraper
The CommunityScrapers repo includes an ABMEA scraper. To use it manually:
1. Open a scene in Stash
2. Click **Edit → Scrape with → ABMEA**
3. If the scene URL is known, enter it; otherwise the scraper will search by title
### Batch URL Scraping Workflow for ABMEA
If you have many files sourced from ABMEA:
1. Before ingesting files, **rename them to include the ABMEA scene ID** in the filename if possible (e.g., `ABMEA-0123_title.mp4`)
2. After scanning, go to **Tagger View**
3. Filter to unmatched scenes and paste ABMEA URLs one by one
### Amateur Content Without a Source Site
For truly anonymous amateur clips:
1. Create a **Studio** entry called `Amateur` (or more specific names like `Amateur - Reddit`)
2. Create **Performer** entries for recurring people you can identify
3. Use **Auto Tag** to match these once entries exist
4. Use tags liberally to compensate for missing structured metadata: `amateur`, `homemade`, `POV`, etc.
### Tag Hierarchy Recommendation
Set up tag parents in **Settings → Tags** to create a browsable hierarchy:
```
Content Type
├── Amateur
├── Professional
└── Compilation
Source
├── ABMEA
├── Clip Site
└── Unknown
Quality
├── 4K
├── 1080p
└── SD
```
---
## 8. Automation with Scheduled Tasks
Minimize manual steps by scheduling recurring tasks.
### Setting Up Scheduled Tasks in Stash
Go to **Settings → Tasks → Scheduled Tasks** and create:
| Task | Schedule | Purpose |
|------|----------|---------|
| Scan | Every 6 hours | Pick up new files automatically |
| Generate (Phashes only) | Every 6 hours | Fingerprint new files |
| Identify | Daily at 2am | Match new fingerprinted files |
| Auto Tag | Daily at 3am | Filename-based fallback tagging |
| Clean | Weekly | Remove missing files from database |
### Auto-Update CommunityScrapers (Linux/macOS)
Add to your crontab (`crontab -e`):
```bash
# Update CommunityScrapers every Sunday at midnight
0 0 * * 0 cd ~/.stash/scrapers/CommunityScrapers && git pull
```
### Auto-Update CommunityScrapers (Windows)
Create a scheduled task in Task Scheduler running:
```powershell
cd C:\Users\YourUser\.stash\scrapers\CommunityScrapers; git pull
```
---
## 9. Tips & Troubleshooting
### Scraper not appearing in Stash
- Go to **Settings → Metadata Providers → Scrapers** and click **Reload Scrapers**
- Check that the `.yml` scraper file is in a subdirectory of your scrapers folder
- Check Stash logs (**Settings → Logs**) for scraper loading errors
### Identify finds no matches
- Confirm phashes were generated (check scene details — phash should be populated)
- Confirm your StashDB/TPDB API keys are correctly entered and not expired
- The file may simply not be in either database — proceed to manual URL scraping
### Files not moving after marking as Organized
- Confirm **"Move files to organized folder"** is enabled in Settings → Library
- Confirm the organized folder path is set and the folder exists
- Check that Stash has write permissions to both source and destination
### Duplicate files
Run **Tasks → Clean → Find Duplicates** before organizing to avoid moving duplicates into your library. Stash uses phash to find visual duplicates even if filenames differ.
### Metadata keeps getting overwritten
In **Settings → Scraping**, set the **Scrape behavior** to `If not set` instead of `Always` to prevent already-populated fields from being overwritten during re-scrapes.
### Useful Stash Plugins
Install via **Settings → Plugins → Browse Available Plugins**:
| Plugin | Purpose |
|--------|---------|
| **Performer Image Cleanup** | Remove duplicate performer images |
| **Tag Graph** | Visualize tag relationships |
| **Duplicate Finder** | Advanced duplicate management |
| **Stats** | Library analytics dashboard |
---
## Quick Reference Checklist
Use this checklist every time you add new content:
```
[ ] Drop files into stash-incoming directory
[ ] Tasks → Scan
[ ] Tasks → Generate → Phashes + Checksums
[ ] Tasks → Identify (StashDB → TPDB)
[ ] Tasks → Auto Tag
[ ] Review unmatched scenes in Tagger View
[ ] Manually scrape remaining unmatched scenes by URL
[ ] Spot-check metadata on a sample of scenes
[ ] Bulk select reviewed scenes → Mark as Organized
[ ] Verify a few files moved and renamed correctly
[ ] Done ✓
```
---
*Last updated: February 2026 | Stash version compatibility: 0.25+*
*Community resources: [Stash Discord](https://discord.gg/2TsNFKt) | [GitHub](https://github.com/stashapp/stash) | [Wiki](https://github.com/stashapp/stash/wiki)*

View file

@ -0,0 +1,58 @@
---
title: Green Grimoire
description: Adult media stack — the satyr's private library
published: true
date: 2026-04-12T00:00:00.000Z
tags: green, adult, stash
editor: markdown
dateCreated: 2026-04-12T00:00:00.000Z
---
# Green Grimoire
![green-badge](/images/green-badge.png)
The Green Grimoire is the self-hosted adult media stack. Separate host and domain from Netgrimoire. All services sit behind `*.wasted-bandwidth.net` and Authelia. Homepage tab: **Nucking-Futz**.
Data lives at `/data/nfs/Baxter/Green/` with two libraries: Clips and Movies.
---
## Services
| Service | URL | Port | Purpose | Host |
|---------|-----|------|---------|------|
| Stash (main) | `stash.wasted-bandwidth.net` | 9999 | Primary adult content library | znas / Compose |
| GreenFin (Jellyfinx) | Internal | 7096 | Green Door media server | docker5 / Compose |
| Namer | `namer.wasted-bandwidth.net` | 6980 | Scene file namer | znas / Compose |
| Whisparr | — | — | Adult content acquisition | znas / Swarm |
| NZBGet | — | — | Downloader | znas / Swarm |
| PocketStash | Internal | 9998 | Stash instance for Pocket Grimoire sync | znas / Compose |
---
## Data Structure
```
/data/nfs/Baxter/Green/
├── Clips/ ← Clips library
├── Movies/ ← Movies library
└── Pocket/ ← Synced to Pocket Grimoire pre-travel
```
---
## Pocket Integration
PocketStash (port 9998) is a separate Stash instance that maintains a curated subset for travel. Before a trip, `syncoid` pushes `vault/Green/Pocket` to the Pocket Grimoire laptop. The Pocket instance runs in read-only travel mode — no writes while traveling.
See [Stash Integration](/Pocket-Grimoire/Software/Stash-Integration) in Pocket Grimoire docs.
---
## Sections
| | |
|---|---|
| [Stash Management](/Green-Grimoire/Library/Stash-Management) | Library config, scrapers, metadata workflow |
| [VHS Restoration](/Green-Grimoire/Scripts/VHS-Restoration) | Encoding, deinterlace, restoration scripts |

View file

@ -0,0 +1,531 @@
---
title: Video Restoration Script
description: Restore VHS Video Captures
published: true
date: 2026-03-06T03:48:12.713Z
tags:
editor: markdown
dateCreated: 2026-03-06T03:48:05.841Z
---
# VHS Video Restoration — User Guide
A pipeline script for cleaning up and upscaling old VHS captures on Ubuntu 24.04.
Runs in two modes: a fast FFmpeg-only cleanup pass, and a full AI upscale using Real-ESRGAN.
---
## Requirements
- **Ubuntu 24.04**
- **FFmpeg**`sudo apt install ffmpeg`
- **bc**`sudo apt install bc`
- **Real-ESRGAN** (optional, for AI upscaling — see setup below)
---
## File Setup
Place everything in a working folder with this structure:
```
~/your-folder/
├── vhs_restore.sh
├── realesrgan-ncnn-vulkan ← AI upscaler binary (optional)
├── models/ ← Real-ESRGAN model files
├── input/ ← Put your source videos here
├── output/ ← Restored videos appear here
└── work/ ← Temporary scratch files (auto-created)
```
Supported input formats: `.mpg`, `.mpeg`, `.mp4`, `.avi`, `.mov`, `.mkv`, `.wmv`, `.m4v`, `.ts`
---
## First-Time Setup
```bash
# Make the script executable
chmod +x vhs_restore.sh
# Create the input folder and add your videos
mkdir input
cp /path/to/your/videos/*.mpg input/
```
### Installing Real-ESRGAN (one-time, for AI upscaling)
1. Download the latest Ubuntu release from:
https://github.com/xinntao/Real-ESRGAN/releases
→ look for `realesrgan-ncnn-vulkan-*-ubuntu.zip`
2. Unzip into your working folder
3. `chmod +x realesrgan-ncnn-vulkan`
---
## Running the Script
### Quick cleanup only (recommended first pass)
Fast — processes in a few minutes per file. No AI upscaling.
```bash
./vhs_restore.sh --no-ai
```
### Full pipeline with AI upscaling
Slow on CPU (plan for several hours per hour of footage). Produces the best results.
```bash
./vhs_restore.sh
```
### All options
| Flag | Description | Default |
|------|-------------|---------|
| `-i DIR` | Input directory | `./input` |
| `-o DIR` | Output directory | `./output` |
| `-w DIR` | Scratch/work directory | `./work` |
| `-b PATH` | Path to Real-ESRGAN binary | `./realesrgan-ncnn-vulkan` |
| `-s 2` or `-s 4` | Upscale factor | `2` |
| `-q 16` | Output quality (051, lower = better) | `16` |
| `--no-ai` | Skip AI upscaling, FFmpeg only | off |
| `--keep` | Keep extracted PNG frames after processing | off |
| `-h` | Show help | |
**Examples:**
```bash
# Process files from a custom folder
./vhs_restore.sh -i ~/Videos/VHS -o ~/Videos/Restored
# 4x upscale with slightly smaller output file
./vhs_restore.sh -s 4 -q 18
# FFmpeg cleanup only, custom folders
./vhs_restore.sh -i ~/Videos/VHS -o ~/Videos/Restored --no-ai
```
---
## What the Script Does
**Stage 1 — FFmpeg cleanup** (always runs):
- Deinterlaces the video (`yadif`) — removes the horizontal combing artifacts common in VHS captures
- Denoises (`hqdn3d=2:1:2:2`) — gentle noise reduction that avoids motion blocking
- Sharpens edges (`unsharp`) — recovers detail softened by the denoise step
- Colour corrects — boosts washed-out VHS colour, adjusts contrast and gamma, corrects the green/yellow cast common in aged tape
**Stage 2 — Frame extraction** (AI mode only):
- Extracts every frame as a PNG into a temporary folder
**Stage 3 — Real-ESRGAN upscaling** (AI mode only):
- Runs the `realesr-animevideov3` model on each frame
- Default: 2× upscale (e.g. 640×480 → 1280×960)
**Reassembly:**
- Rebuilds the video from upscaled frames with the original audio
---
## Live Progress
The script shows live FFmpeg output. Watch for:
- `speed=3.5x` — processing at 3.5× realtime (good)
- `speed=0.5x` — slow, likely a very heavy filter load
- `corrupt decoded frame` — normal for damaged VHS files, FFmpeg will push through
---
## Troubleshooting
**Script hangs with no output**
Run with `--no-ai` first to confirm FFmpeg is working, then check that your Real-ESRGAN binary is executable (`chmod +x realesrgan-ncnn-vulkan`).
**Output looks blocky during motion**
The denoise values may still be too high for your footage. Edit the script and reduce `hqdn3d=2:1:2:2` to `hqdn3d=1:1:1:1`, or remove `hqdn3d` entirely — Real-ESRGAN handles noise well on its own.
**Colour looks over-saturated**
Reduce `saturation=1.8` in the filter chain to `saturation=1.4` or `1.2`.
**Real-ESRGAN not found**
Ensure the binary is in the same folder as the script and is executable. Or pass the path explicitly: `./vhs_restore.sh -b /path/to/realesrgan-ncnn-vulkan`
**Error logs**
All FFmpeg and Real-ESRGAN logs are saved to `/tmp/` for diagnosis:
- `/tmp/ffmpeg_stage1.log`
- `/tmp/ffmpeg_extract.log`
- `/tmp/realesrgan.log`
- `/tmp/ffmpeg_reassemble.log`
---
## Workflow Recommendation
1. Run `--no-ai` first on one file to check the cleanup result
2. If it looks good, run the full pipeline on all files overnight
3. For heavily damaged footage, consider also running **CodeFormer** (face restoration) on top of the output — particularly effective if the video contains people
---
## Output
Restored files are saved to `./output/` as `<original_name>_restored.mp4` encoded as H.264 with AAC audio.
## vhs_restore.sh Script
`#!/usr/bin/env bash
# =============================================================================
# vhs_restore.sh — Automated VHS Video Restoration Pipeline
# Stages: Deinterlace → Denoise → Colour correct → AI Upscale → Reassemble
#
# Changes from v1:
# - Gentle hqdn3d (2:1:2:2) to prevent motion blocking/pixelation
# - Aggressive colour correction for washed-out VHS footage
# - Live FFmpeg progress shown in terminal (no silent hanging)
# - Logs still saved to /tmp/ for error diagnosis
# =============================================================================
set -euo pipefail
# ── Colour output helpers ────────────────────────────────────────────────────
RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'
CYAN='\033[0;36m'; BOLD='\033[1m'; NC='\033[0m'
info() { echo -e "${CYAN}[INFO]${NC} $*"; }
success() { echo -e "${GREEN}[OK]${NC} $*"; }
warn() { echo -e "${YELLOW}[WARN]${NC} $*"; }
error() { echo -e "${RED}[ERROR]${NC} $*" >&2; }
header() { echo -e "\n${BOLD}${CYAN}══ $* ══${NC}"; }
# ── Default configuration ────────────────────────────────────────────────────
INPUT_DIR="./input" # Folder containing your source VHS videos
OUTPUT_DIR="./output" # Final restored videos land here
WORK_DIR="./work" # Scratch space (frames, temp files)
REALESRGAN_BIN="./realesrgan-ncnn-vulkan" # Path to Real-ESRGAN binary
REALESRGAN_MODEL="realesr-animevideov3" # Best model for home video
UPSCALE_FACTOR=2 # 2x or 4x (4x is very slow on CPU)
OUTPUT_WIDTH=1920 # Target width used in --no-ai mode
OUTPUT_HEIGHT=1080 # Target height used in --no-ai mode
CRF=16 # Output quality 0-51, lower = better
PRESET="slow" # FFmpeg encode preset
SKIP_UPSCALE=false # --no-ai flag sets this true
KEEP_FRAMES=false # --keep flag sets this true
# ── Parse CLI flags ──────────────────────────────────────────────────────────
usage() {
cat <<EOF
Usage: $(basename "$0") [options]
Options:
-i DIR Input directory (default: ./input)
-o DIR Output directory (default: ./output)
-w DIR Work/scratch dir (default: ./work)
-b PATH Path to realesrgan-ncnn-vulkan binary
-s FACTOR Upscale factor: 2 or 4 (default: 2)
-q CRF Output quality 0-51, lower=better (default: 16)
--no-ai Skip Real-ESRGAN; FFmpeg cleanup only (fast)
--keep Keep extracted frames after processing
-h Show this help
Examples:
$(basename "$0") -i ~/Videos/VHS -o ~/Videos/Restored
$(basename "$0") -i ~/Videos/VHS --no-ai # Quick cleanup only
$(basename "$0") -i ~/Videos/VHS -s 4 -q 18 # 4x upscale
EOF
exit 0
}
while [[ $# -gt 0 ]]; do
case "$1" in
-i) INPUT_DIR="$2"; shift 2 ;;
-o) OUTPUT_DIR="$2"; shift 2 ;;
-w) WORK_DIR="$2"; shift 2 ;;
-b) REALESRGAN_BIN="$2"; shift 2 ;;
-s) UPSCALE_FACTOR="$2"; shift 2 ;;
-q) CRF="$2"; shift 2 ;;
--no-ai) SKIP_UPSCALE=true; shift ;;
--keep) KEEP_FRAMES=true; shift ;;
-h|--help) usage ;;
*) error "Unknown option: $1"; usage ;;
esac
done
# ── Dependency checks ────────────────────────────────────────────────────────
header "Checking dependencies"
check_cmd() {
if command -v "$1" &>/dev/null; then
success "$1 found"
else
error "$1 not found. Install with: $2"
exit 1
fi
}
check_cmd ffmpeg "sudo apt install ffmpeg"
check_cmd ffprobe "sudo apt install ffmpeg"
check_cmd bc "sudo apt install bc"
if [[ "$SKIP_UPSCALE" == false ]]; then
if [[ ! -x "$REALESRGAN_BIN" ]]; then
warn "Real-ESRGAN binary not found at: $REALESRGAN_BIN"
echo
echo -e "${YELLOW}To install Real-ESRGAN:${NC}"
echo " 1. Download: https://github.com/xinntao/Real-ESRGAN/releases"
echo " -> realesrgan-ncnn-vulkan-*-ubuntu.zip"
echo " 2. Unzip into this directory"
echo " 3. chmod +x realesrgan-ncnn-vulkan"
echo " 4. Re-run this script"
echo
echo "Or run with --no-ai for FFmpeg-only cleanup (no upscaling)."
exit 1
fi
success "Real-ESRGAN found"
fi
# ── Locate input files ───────────────────────────────────────────────────────
header "Scanning input directory: $INPUT_DIR"
if [[ ! -d "$INPUT_DIR" ]]; then
error "Input directory not found: $INPUT_DIR"
exit 1
fi
mapfile -t VIDEO_FILES < <(find "$INPUT_DIR" -maxdepth 1 \
-type f \( -iname "*.mp4" -o -iname "*.avi" -o -iname "*.mov" \
-o -iname "*.mkv" -o -iname "*.mpg" -o -iname "*.mpeg" \
-o -iname "*.wmv" -o -iname "*.m4v" -o -iname "*.ts" \) \
| sort)
if [[ ${#VIDEO_FILES[@]} -eq 0 ]]; then
error "No video files found in $INPUT_DIR"
exit 1
fi
info "Found ${#VIDEO_FILES[@]} video file(s):"
for f in "${VIDEO_FILES[@]}"; do echo " * $(basename "$f")"; done
# ── Helpers ──────────────────────────────────────────────────────────────────
probe() {
ffprobe -v error -select_streams v:0 \
-show_entries "stream=$2" -of csv=p=0 "$1" 2>/dev/null | head -1
}
human_time() {
local s="${1%.*}"
printf '%dh %dm %ds' $((s/3600)) $(( (s%3600)/60 )) $((s%60))
}
# ── Create directories ───────────────────────────────────────────────────────
mkdir -p "$OUTPUT_DIR" "$WORK_DIR"
# ── Overall stats ────────────────────────────────────────────────────────────
TOTAL_FILES=${#VIDEO_FILES[@]}
PROCESSED=0
FAILED=0
PIPELINE_START=$(date +%s)
# ════════════════════════════════════════════════════════════════════════════
# MAIN LOOP
# ════════════════════════════════════════════════════════════════════════════
for INPUT_FILE in "${VIDEO_FILES[@]}"; do
BASENAME=$(basename "$INPUT_FILE")
STEM="${BASENAME%.*}"
CLEANED="$WORK_DIR/${STEM}_cleaned.mp4"
FRAMES_IN="$WORK_DIR/${STEM}_frames_in"
FRAMES_OUT="$WORK_DIR/${STEM}_frames_out"
FINAL_OUTPUT="$OUTPUT_DIR/${STEM}_restored.mp4"
header "Processing: $BASENAME ($((PROCESSED+1))/$TOTAL_FILES)"
FILE_START=$(date +%s)
# ── Probe source ──────────────────────────────────────────────────────────
FPS=$(probe "$INPUT_FILE" "r_frame_rate")
FPS_DEC=$(echo "scale=3; $FPS" | bc 2>/dev/null || echo "25")
WIDTH=$(probe "$INPUT_FILE" "width")
HEIGHT=$(probe "$INPUT_FILE" "height")
FIELD_ORDER=$(probe "$INPUT_FILE" "field_order")
DURATION=$(ffprobe -v error -show_entries format=duration \
-of csv=p=0 "$INPUT_FILE" 2>/dev/null | head -1)
info "Source: ${WIDTH}x${HEIGHT} ${FPS_DEC}fps $(human_time "${DURATION%.*}") field_order=${FIELD_ORDER:-unknown}"
# Always deinterlace for VHS -- safe even if not flagged as interlaced
if [[ "$FIELD_ORDER" =~ ^(tt|tb|bt|bb)$ ]]; then
DEINTERLACE_FILTER="yadif=mode=1,"
info "Interlacing detected — applying yadif deinterlacer"
else
DEINTERLACE_FILTER="yadif=mode=1,"
warn "Interlacing not confirmed by probe — applying yadif anyway (safe for VHS)"
fi
# ── Stage 1: FFmpeg cleanup ───────────────────────────────────────────────
header "Stage 1/3 — FFmpeg cleanup & colour correction"
info "Watch fps= and speed= for live progress."
info "Corrupt frame warnings are normal for old VHS captures."
echo
if [[ "$SKIP_UPSCALE" == true ]]; then
SCALE_FILTER="scale=${OUTPUT_WIDTH}:${OUTPUT_HEIGHT}:flags=lanczos,"
else
SCALE_FILTER=""
fi
# Filter chain notes:
# hqdn3d=2:1:2:2 -- gentle denoise; low temporal values (3rd/4th)
# prevent the motion blocking seen with higher values
# unsharp -- moderate sharpening to recover edge detail
# eq -- aggressive colour boost for washed-out VHS
# colorbalance -- corrects the green/yellow cast common in aged VHS
VFILTER="${DEINTERLACE_FILTER}\
hqdn3d=2:1:2:2,\
unsharp=3:3:0.5:3:3:0.3,\
eq=contrast=1.2:brightness=0.05:saturation=1.8:gamma=1.1,\
colorbalance=rs=0.1:gs=0.0:bs=-0.1,\
${SCALE_FILTER}\
format=yuv420p"
if ! ffmpeg -y -i "$INPUT_FILE" \
-vf "$VFILTER" \
-c:v libx264 -crf 18 -preset medium \
-c:a aac -b:a 192k -ac 2 \
-stats \
"$CLEANED" 2>&1 | tee /tmp/ffmpeg_stage1.log | \
grep --line-buffered -E "(frame=|speed=|error|Error|Invalid)"; then
error "FFmpeg stage 1 failed. Full log: /tmp/ffmpeg_stage1.log"
FAILED=$((FAILED+1))
continue
fi
echo
success "Stage 1 complete -> $(du -sh "$CLEANED" | cut -f1)"
if [[ "$SKIP_UPSCALE" == true ]]; then
cp "$CLEANED" "$FINAL_OUTPUT"
success "Output (no AI): $FINAL_OUTPUT"
PROCESSED=$((PROCESSED+1))
[[ "$KEEP_FRAMES" == false ]] && rm -f "$CLEANED"
continue
fi
# ── Stage 2: Extract frames ───────────────────────────────────────────────
header "Stage 2/3 — Extracting frames for AI upscaling"
mkdir -p "$FRAMES_IN" "$FRAMES_OUT"
FRAME_COUNT=$(ffprobe -v error -count_packets \
-select_streams v:0 -show_entries stream=nb_read_packets \
-of csv=p=0 "$CLEANED" 2>/dev/null | head -1)
FRAME_COUNT=${FRAME_COUNT:-0}
info "Extracting ~${FRAME_COUNT} frames..."
if ! ffmpeg -y -i "$CLEANED" \
-vsync 0 -stats \
"$FRAMES_IN/frame%08d.png" 2>&1 | tee /tmp/ffmpeg_extract.log | \
grep --line-buffered -E "(frame=|speed=|error|Error)"; then
error "Frame extraction failed. Full log: /tmp/ffmpeg_extract.log"
FAILED=$((FAILED+1))
continue
fi
ACTUAL_FRAMES=$(find "$FRAMES_IN" -name "*.png" | wc -l)
echo
success "Extracted $ACTUAL_FRAMES frames"
# ── Stage 3: Real-ESRGAN ──────────────────────────────────────────────────
header "Stage 3/3 — Real-ESRGAN AI upscaling (${UPSCALE_FACTOR}x)"
warn "Slow on CPU — est. $(echo "scale=0; $ACTUAL_FRAMES * 10 / 60" | bc)-$(echo "scale=0; $ACTUAL_FRAMES * 30 / 60" | bc) minutes"
info "Upscaled frames will appear in: $FRAMES_OUT"
echo
UPSCALE_START=$(date +%s)
if ! "$REALESRGAN_BIN" \
-i "$FRAMES_IN" \
-o "$FRAMES_OUT" \
-n "$REALESRGAN_MODEL" \
-s "$UPSCALE_FACTOR" \
-f png 2>&1 | tee /tmp/realesrgan.log; then
error "Real-ESRGAN failed. Full log: /tmp/realesrgan.log"
FAILED=$((FAILED+1))
continue
fi
UPSCALE_END=$(date +%s)
UPSCALE_ELAPSED=$((UPSCALE_END - UPSCALE_START))
success "AI upscaling complete in $(human_time $UPSCALE_ELAPSED)"
# ── Reassemble ────────────────────────────────────────────────────────────
REASSEMBLE_FPS=$(ffprobe -v error -select_streams v:0 \
-show_entries stream=r_frame_rate \
-of csv=p=0 "$CLEANED" 2>/dev/null | head -1)
info "Reassembling video from upscaled frames..."
echo
if ! ffmpeg -y \
-framerate "$REASSEMBLE_FPS" \
-i "$FRAMES_OUT/frame%08d.png" \
-i "$CLEANED" \
-map 0:v -map 1:a \
-c:v libx264 -crf "$CRF" -preset "$PRESET" \
-c:a copy \
-movflags +faststart \
-stats \
"$FINAL_OUTPUT" 2>&1 | tee /tmp/ffmpeg_reassemble.log | \
grep --line-buffered -E "(frame=|speed=|error|Error)"; then
error "Reassembly failed. Full log: /tmp/ffmpeg_reassemble.log"
FAILED=$((FAILED+1))
continue
fi
# ── Cleanup ───────────────────────────────────────────────────────────────
if [[ "$KEEP_FRAMES" == false ]]; then
rm -rf "$FRAMES_IN" "$FRAMES_OUT" "$CLEANED"
info "Scratch files cleaned up"
else
info "Frames kept in: $FRAMES_IN / $FRAMES_OUT"
fi
FILE_END=$(date +%s)
FILE_ELAPSED=$((FILE_END - FILE_START))
PROCESSED=$((PROCESSED+1))
OUT_SIZE=$(du -sh "$FINAL_OUTPUT" | cut -f1)
echo
success "Done: $FINAL_OUTPUT"
info " File size : $OUT_SIZE"
info " Time taken: $(human_time $FILE_ELAPSED)"
done
# ════════════════════════════════════════════════════════════════════════════
# Final summary
# ════════════════════════════════════════════════════════════════════════════
PIPELINE_END=$(date +%s)
PIPELINE_ELAPSED=$((PIPELINE_END - PIPELINE_START))
header "Pipeline Complete"
echo -e " ${GREEN}Processed : $PROCESSED / $TOTAL_FILES${NC}"
[[ $FAILED -gt 0 ]] && echo -e " ${RED}Failed : $FAILED${NC}"
echo -e " Total time: $(human_time $PIPELINE_ELAPSED)"
echo -e " Output dir: $OUTPUT_DIR"
echo
if [[ $PROCESSED -gt 0 ]]; then
echo "Restored files:"
find "$OUTPUT_DIR" -name "*_restored.mp4" | while read -r f; do
SIZE=$(du -sh "$f" | cut -f1)
echo " * $(basename "$f") ($SIZE)"
done
fi
`

View file

@ -0,0 +1,72 @@
---
title: Gremlin Grimoire
description: Netgrimoire's local AI — the gremlin that runs the machine
published: true
date: 2026-04-12T00:00:00.000Z
tags: gremlin, ai, ollama, n8n
editor: markdown
dateCreated: 2026-04-12T00:00:00.000Z
---
# Gremlin Grimoire
![gremlin-badge](/images/gremlin-badge.png)
Gremlin is the local AI layer of Netgrimoire. It's not just a chat interface — it's an autonomous agent that watches the infrastructure, audits the codebase, triages alerts, and answers questions about the lab. The gremlin lives inside the machine and knows every dark corner of it.
---
## What Gremlin Is
Gremlin is a stack of four services running together on `docker4`, all pinned to the same Swarm node:
| Service | Role | URL |
|---------|------|-----|
| **Ollama** | Local LLM inference (CPU-only, Ryzen) | `http://ollama:11434` · `ollama.netgrimoire.com:11434` |
| **Open WebUI** | Chat interface + RAG frontend | `https://ai.netgrimoire.com` |
| **Qdrant** | Vector database for RAG knowledge base | `http://qdrant:6333` · dashboard `:6333/dashboard` |
| **n8n** | Automation brain — autonomous workflows | `https://n8n.netgrimoire.com` |
---
## What Gremlin Does Today
| Capability | Status | Workflow |
|-----------|--------|---------|
| Weekly YAML audit of all compose files | ✅ Live | Forgejo Audit — Monday 06:00 |
| Uptime Kuma alert triage | ✅ Live | Kuma Triage — webhook-triggered |
| Interactive chat with lab context | ✅ Live | Open WebUI + Ollama |
| RAG over wiki/docs | 🔧 Wired, not populated | Qdrant connected, knowledge base empty |
| Doc generation from compose files | 🟡 Parked | CPU quality insufficient — awaiting GPU |
| Email triage | 📋 Planned | Phase 3 — not built |
---
## Models
| Model | Size | Used For |
|-------|------|---------|
| `qwen2.5-coder:7b` | ~5 GB | Code review, YAML audits, compose analysis |
| `llama3.2:3b` | ~2 GB | Alert triage, Q&A, summarization |
Models must be pulled before workflows run. See [Ollama Model Management](/Gremlin-Grimoire/Runbooks/Model-Management).
---
## Sections
| | |
|---|---|
| [Stack](/Gremlin-Grimoire/Stack/Build-Config) | Full build config, volumes, env vars, compose YAML |
| [Workflows](/Gremlin-Grimoire/Workflows/Forgejo-Audit) | All n8n workflows — architecture, patterns, gotchas |
| [Runbooks](/Gremlin-Grimoire/Runbooks/Deploy) | Deploy, model management, troubleshooting |
---
## Planned Evolution
- **Homelable MCP backend** — next up. Provides tool-use for infra Q&A (topology, running services, resource usage). Blocked until Homelable stack is deployed.
- **GPU support** — unlocks doc generation and larger models. Compose GPU block is commented out, ready to enable.
- **Gremlin role variants** — specialized personas per domain (Proxy Gremlin, Storage Gremlin, Security Gremlin, etc.) with mood states and dynamic badge serving via Caddy.
- **RAG knowledge base population** — index all Wiki.js pages and the compose template standard into Qdrant.
- **Gremlin Router** — dedicated Flask container for webhook routing (currently handled directly by n8n).

View file

@ -0,0 +1,73 @@
---
title: Deploy Gremlin Stack
description: How to deploy and redeploy the Gremlin AI stack
published: true
date: 2026-04-12T00:00:00.000Z
tags: gremlin, deploy, runbook
editor: markdown
dateCreated: 2026-04-12T00:00:00.000Z
---
# Deploy Gremlin Stack
All Gremlin services run on `docker4` (hermes), pinned via `node.hostname == docker4`.
---
## Prerequisites
```bash
# On docker4 — create volume directories
mkdir -p /DockerVol/ollama
mkdir -p /DockerVol/open-webui
mkdir -p /DockerVol/qdrant
# n8n requires specific ownership
mkdir -p /DockerVol/n8n
chown -R 1000:1000 /DockerVol/n8n
```
---
## Deploy
```bash
cd ~/services && git pull
cd swarm/stack/Gremlin
set -a && source .env && set +a
docker stack config --compose-file gremlin-stack.yml > resolved.yml
docker stack deploy --compose-file resolved.yml gremlin
rm resolved.yml
docker stack services gremlin
```
---
## Pull Models After Deploy
Models must be pulled before n8n workflows run. Ollama returns a silent model-not-found error if workflows fire first.
```bash
docker exec $(docker ps -qf name=gremlin_ollama) ollama pull llama3.2:3b
docker exec $(docker ps -qf name=gremlin_ollama) ollama pull qwen2.5-coder:7b
# Verify
docker exec $(docker ps -qf name=gremlin_ollama) ollama list
```
---
## Verify Open WebUI Secret Key
Check that `WEBUI_SECRET_KEY` in `.env` on docker4 is set to a real secret, not the placeholder `change-this-secret-key`.
---
## Service URLs After Deploy
| Service | Internal | External |
|---------|----------|---------|
| Ollama | `http://ollama:11434` | `http://ollama.netgrimoire.com:11434` |
| Open WebUI | `http://open-webui:8080` | `https://ai.netgrimoire.com` |
| Qdrant | `http://qdrant:6333` | `http://qdrant.netgrimoire.com:6333/dashboard` |
| n8n | `http://n8n:5678` | `https://n8n.netgrimoire.com` |

View file

@ -0,0 +1,41 @@
---
title: Ollama Model Management
description: Pulling, verifying, and managing models on the Gremlin stack
published: true
date: 2026-04-12T00:00:00.000Z
tags: gremlin, ollama, models, runbook
editor: markdown
dateCreated: 2026-04-12T00:00:00.000Z
---
# Ollama Model Management
## Pull Required Models
Run on docker4 after any fresh deploy or after the Ollama container is recreated:
```bash
docker exec $(docker ps -qf name=gremlin_ollama) ollama pull llama3.2:3b
docker exec $(docker ps -qf name=gremlin_ollama) ollama pull qwen2.5-coder:7b
```
## Verify Models Loaded
```bash
docker exec $(docker ps -qf name=gremlin_ollama) ollama list
```
## Model Reference
| Model | Size | Pull Time (CPU) | Used By |
|-------|------|----------------|---------|
| `llama3.2:3b` | ~2 GB | ~5 min | Kuma triage, Open WebUI |
| `qwen2.5-coder:7b` | ~5 GB | ~15 min | Forgejo audit, Open WebUI |
## Models Storage Path
`/DockerVol/ollama` — survives container restarts and redeployments.
## ⚠ Pull Before Workflows Run
n8n workflows fail silently if models aren't present. Ollama returns a model-not-found response but n8n may not surface this as an obvious error. Always pull models immediately after deploy before enabling workflows.

View file

@ -0,0 +1,64 @@
---
title: Gremlin Troubleshooting
description: Common Gremlin stack problems and fixes
published: true
date: 2026-04-12T00:00:00.000Z
tags: gremlin, troubleshooting, runbook
editor: markdown
dateCreated: 2026-04-12T00:00:00.000Z
---
# Gremlin Troubleshooting
## n8n Won't Start / Permission Error
```bash
# On docker4
chown -R 1000:1000 /DockerVol/n8n
docker service update --force gremlin_n8n
```
## Workflow Fails Silently on Ollama Call
Model not pulled. Ollama returns model-not-found but n8n may not surface it clearly.
```bash
docker exec $(docker ps -qf name=gremlin_ollama) ollama list
# If model missing:
docker exec $(docker ps -qf name=gremlin_ollama) ollama pull llama3.2:3b
docker exec $(docker ps -qf name=gremlin_ollama) ollama pull qwen2.5-coder:7b
```
## Forgejo Webhook Not Reaching n8n
Add to Forgejo `app.ini`:
```ini
[webhook]
ALLOWED_HOST_LIST = *
```
Restart Forgejo. Required when `OFFLINE_MODE = true`.
## Caddy Routes to Wrong Container IP
Ensure all Gremlin services include in labels:
```yaml
caddy_ingress_network: netgrimoire
```
Never use `{{upstreams PORT}}` — breaks during `docker stack config` preprocessing. Use `caddy.reverse_proxy: servicename:PORT`.
## Audit Workflow Times Out
Check `N8N_RUNNERS_TASK_TIMEOUT` is set to `3600` in n8n environment. Default timeout is too short for 67-file audit runs.
## n8n Code Node Can't Access Env Vars
Set `N8N_BLOCK_ENV_ACCESS_IN_NODE=false` in n8n environment.
## Open WebUI Can't Connect to Qdrant
Verify both services are on the `netgrimoire` overlay and pinned to `docker4`. Qdrant gRPC port is 6334, REST is 6333.
## Audit Reports Not Committing to Forgejo
Check write token is set in n8n credentials. The read and write tokens are separate — confirm the workflow is using the write token for commit operations (POST new files, PUT+SHA for updates).

View file

@ -0,0 +1,503 @@
---
title: Ollama with agent
description: The smart home reference
published: true
date: 2026-04-02T21:11:09.564Z
tags:
editor: markdown
dateCreated: 2026-02-18T22:14:41.533Z
---
# AI Automation Stack - Ollama + n8n + Open WebUI
## Overview
This stack provides a complete self-hosted AI automation solution for homelab infrastructure management, documentation generation, and intelligent monitoring. The system consists of four core components that work together to provide AI-powered workflows and knowledge management.
## Architecture
```
┌─────────────────────────────────────────────────┐
│ AI Automation Stack │
│ │
│ Open WebUI ────────┐ │
│ (Chat Interface) │ │
│ │ │ │
│ ▼ ▼ │
│ Ollama ◄──── Qdrant │
│ (LLM Runtime) (Vector DB) │
│ ▲ │
│ │ │
│ n8n │
│ (Workflow Engine) │
│ │ │
│ ▼ │
│ Forgejo │ Wiki.js │ Monitoring │
└─────────────────────────────────────────────────┘
```
## Components
### Ollama
- **Purpose**: Local LLM runtime engine
- **Port**: 11434
- **Resource Usage**: 4-6GB RAM (depending on model)
- **Recommended Models**:
- `qwen2.5-coder:7b` - Code analysis and documentation
- `llama3.2:3b` - General queries and chat
- `phi3:mini` - Lightweight alternative
### Open WebUI
- **Purpose**: User-friendly chat interface with built-in RAG (Retrieval Augmented Generation)
- **Port**: 3000
- **Features**:
- Document ingestion from Wiki.js
- Conversational interface for querying documentation
- RAG pipeline for context-aware responses
- Multi-model support
- **Access**: `http://your-server-ip:3000`
### Qdrant
- **Purpose**: Vector database for semantic search and RAG
- **Ports**: 6333 (HTTP), 6334 (gRPC)
- **Resource Usage**: ~1GB RAM
- **Function**: Stores embeddings of your documentation, code, and markdown files
### n8n
- **Purpose**: Workflow automation and orchestration
- **Port**: 5678
- **Default Credentials**:
- Username: `admin`
- Password: `change-this-password` (⚠️ **Change this immediately**)
- **Access**: `http://your-server-ip:5678`
## Installation
### Prerequisites
- Docker and Docker Compose installed
- 16GB RAM minimum (8GB available for the stack)
- 50GB disk space for models and data
### Deployment Steps
1. **Create directory structure**:
```bash
mkdir -p ~/ai-stack/{n8n/workflows}
cd ~/ai-stack
```
2. **Download the compose file**:
```bash
# Place the ai-stack-compose.yml in this directory
wget [your-internal-url]/ai-stack-compose.yml
```
3. **Configure environment variables**:
```bash
# Edit the compose file and change:
# - WEBUI_SECRET_KEY
# - N8N_BASIC_AUTH_PASSWORD
# - WEBHOOK_URL (use your server's IP)
# - GENERIC_TIMEZONE
nano ai-stack-compose.yml
```
4. **Start the stack**:
```bash
docker-compose -f ai-stack-compose.yml up -d
```
5. **Pull Ollama models**:
```bash
docker exec -it ollama ollama pull qwen2.5-coder:7b
docker exec -it ollama ollama pull llama3.2:3b
```
6. **Verify services**:
```bash
docker-compose -f ai-stack-compose.yml ps
```
## Configuration
### Open WebUI Setup
1. Navigate to `http://your-server-ip:3000`
2. Create your admin account (first user becomes admin)
3. Go to **Settings → Connections** and verify Ollama connection
4. Configure Qdrant:
- Host: `qdrant`
- Port: `6333`
### Setting Up RAG for Wiki.js
1. In Open WebUI, go to **Workspace → Knowledge**
2. Create a new collection: "Homelab Documentation"
3. Add sources:
- **URL Crawl**: Enter your Wiki.js base URL
- **File Upload**: Upload markdown files from repositories
4. Process and index the documents
### n8n Initial Configuration
1. Navigate to `http://your-server-ip:5678`
2. Log in with credentials from docker-compose file
3. Import starter workflows from `/n8n/workflows/` directory
## Use Cases
### 1. Automated Documentation Generation
**Workflow**: Forgejo webhook → n8n → Ollama → Wiki.js
When code is pushed to Forgejo:
1. n8n receives webhook from Forgejo
2. Extracts changed files and repo context
3. Sends to Ollama with prompt: "Generate documentation for this code"
4. Posts generated docs to Wiki.js via API
**Example n8n Workflow**:
```
Webhook Trigger
→ HTTP Request (Forgejo API - get file contents)
→ Ollama LLM Node (generate docs)
→ HTTP Request (Wiki.js API - create/update page)
→ Send notification (completion)
```
### 2. Docker-Compose Standardization
**Workflow**: Repository scan → compliance check → issue creation
1. n8n runs on schedule (daily/weekly)
2. Queries Forgejo API for all repositories
3. Scans for `docker-compose.yml` files
4. Compares against template standards stored in Qdrant
5. Generates compliance report with Ollama
6. Creates Forgejo issues for non-compliant repos
### 3. Intelligent Alert Processing
**Workflow**: Monitoring alert → AI analysis → smart routing
1. Beszel/Uptime Kuma sends webhook to n8n
2. n8n queries historical data and context
3. Ollama analyzes:
- Is this expected? (scheduled backup, known maintenance)
- Severity level
- Recommended action
4. Routes appropriately:
- Critical: Immediate notification (Telegram/email)
- Warning: Log and monitor
- Info: Suppress (expected behavior)
### 4. Email Monitoring & Triage
**Workflow**: IMAP polling → AI classification → action routing
1. n8n polls email inbox every 5 minutes
2. Filters for keywords: "alert", "critical", "down", "failed"
3. Ollama classifies urgency and determines if actionable
4. Routes based on classification:
- Urgent: Forward to you immediately
- Informational: Daily digest
- Spam: Archive
## Common Workflows
### Example: Repository Documentation Generator
```javascript
// n8n workflow nodes:
1. Schedule Trigger (daily at 2 AM)
2. HTTP Request - Forgejo API
URL: http://forgejo:3000/api/v1/repos/search
Method: GET
3. Loop Over Items (each repo)
4. HTTP Request - Get repo files
URL: {{$node["Forgejo API"].json["clone_url"]}}/contents
5. Filter - Find docker-compose.yml and README.md
6. Ollama Node
Model: qwen2.5-coder:7b
Prompt: "Analyze this docker-compose file and generate comprehensive
documentation including: purpose, services, ports, volumes,
environment variables, and setup instructions."
7. HTTP Request - Wiki.js API
URL: http://wikijs:3000/graphql
Method: POST
Body: {mutation: createPage(...)}
8. Send Notification
Service: Telegram/Email
Message: "Documentation updated for {{repo_name}}"
```
### Example: Alert Intelligence Workflow
```javascript
// n8n workflow nodes:
1. Webhook Trigger
Path: /webhook/monitoring-alert
2. Function Node - Parse Alert Data
JavaScript: Extract service, metric, value, timestamp
3. HTTP Request - Query Historical Data
URL: http://beszel:8090/api/metrics/history
4. Ollama Node
Model: llama3.2:3b
Context: Your knowledge base in Qdrant
Prompt: "Alert: {{alert_message}}
Historical context: {{historical_data}}
Is this expected behavior?
What's the severity?
What action should be taken?"
5. Switch Node - Route by Severity
Conditions:
- Critical: Route to immediate notification
- Warning: Route to monitoring channel
- Info: Route to log only
6a. Send Telegram (Critical path)
6b. Post to Slack (Warning path)
6c. Write to Log (Info path)
```
## Maintenance
### Model Management
```bash
# List installed models
docker exec -it ollama ollama list
# Update a model
docker exec -it ollama ollama pull qwen2.5-coder:7b
# Remove unused models
docker exec -it ollama ollama rm old-model:tag
```
### Backup Important Data
```bash
# Backup Qdrant vector database
docker-compose -f ai-stack-compose.yml stop qdrant
tar -czf qdrant-backup-$(date +%Y%m%d).tar.gz ./qdrant_data/
docker-compose -f ai-stack-compose.yml start qdrant
# Backup n8n workflows (automatic to ./n8n/workflows)
tar -czf n8n-backup-$(date +%Y%m%d).tar.gz ./n8n_data/
# Backup Open WebUI data
tar -czf openwebui-backup-$(date +%Y%m%d).tar.gz ./open_webui_data/
```
### Log Monitoring
```bash
# View all stack logs
docker-compose -f ai-stack-compose.yml logs -f
# View specific service
docker logs -f ollama
docker logs -f n8n
docker logs -f open-webui
```
### Resource Monitoring
```bash
# Check resource usage
docker stats
# Expected usage:
# - ollama: 4-6GB RAM (with model loaded)
# - open-webui: ~500MB RAM
# - qdrant: ~1GB RAM
# - n8n: ~200MB RAM
```
## Troubleshooting
### Ollama Not Responding
```bash
# Check if Ollama is running
docker logs ollama
# Restart Ollama
docker restart ollama
# Test Ollama API
curl http://localhost:11434/api/tags
```
### Open WebUI Can't Connect to Ollama
1. Check network connectivity:
```bash
docker exec -it open-webui ping ollama
```
2. Verify Ollama URL in Open WebUI settings
3. Restart both containers:
```bash
docker restart ollama open-webui
```
### n8n Workflows Failing
1. Check n8n logs:
```bash
docker logs n8n
```
2. Verify webhook URLs are accessible
3. Test Ollama connection from n8n:
- Create test workflow
- Add Ollama node
- Run execution
### Qdrant Connection Issues
```bash
# Check Qdrant health
curl http://localhost:6333/health
# View Qdrant logs
docker logs qdrant
# Restart if needed
docker restart qdrant
```
## Performance Optimization
### Model Selection by Use Case
- **Quick queries, chat**: `llama3.2:3b` or `phi3:mini` (fastest)
- **Code analysis**: `qwen2.5-coder:7b` or `deepseek-coder:6.7b`
- **Complex reasoning**: `mistral:7b` or `llama3.1:8b`
### n8n Workflow Optimization
- Use **Wait** nodes to batch operations
- Enable **Execute Once** for loops to reduce memory
- Store large data in temporary files instead of node output
- Use **Split In Batches** for processing large datasets
### Qdrant Performance
- Default settings are optimized for homelab use
- Increase `collection_shards` if indexing >100,000 documents
- Enable quantization for large collections
## Security Considerations
### Change Default Credentials
```bash
# Generate secure password
openssl rand -base64 32
# Update in docker-compose.yml:
# - WEBUI_SECRET_KEY
# - N8N_BASIC_AUTH_PASSWORD
```
### Network Isolation
Consider using a reverse proxy (Traefik, Nginx Proxy Manager) with authentication:
- Limit external access to Open WebUI only
- Keep n8n, Ollama, Qdrant on internal network
- Use VPN for remote access
### API Security
- Use strong API tokens for Wiki.js and Forgejo integrations
- Rotate credentials periodically
- Audit n8n workflow permissions
## Integration Points
### Connecting to Existing Services
**Uptime Kuma**:
- Configure webhook alerts → n8n webhook URL
- Path: `http://your-server-ip:5678/webhook/uptime-kuma`
**Beszel**:
- Use Shoutrrr webhook format
- URL: `http://your-server-ip:5678/webhook/beszel`
**Forgejo**:
- Repository webhooks for push events
- URL: `http://your-server-ip:5678/webhook/forgejo-push`
- Enable in repo settings → Webhooks
**Wiki.js**:
- GraphQL API endpoint: `http://wikijs:3000/graphql`
- Create API key in Wiki.js admin panel
- Store in n8n credentials
## Advanced Features
### Creating Custom n8n Nodes
For frequently used Ollama prompts, create custom nodes:
1. Go to n8n → Settings → Community Nodes
2. Install `n8n-nodes-ollama-advanced` if available
3. Or create Function nodes with reusable code
### Training Custom Models
While Ollama doesn't support fine-tuning directly, you can:
1. Use RAG with your specific documentation
2. Create detailed system prompts in n8n
3. Store organization-specific context in Qdrant
### Multi-Agent Workflows
Chain multiple Ollama calls for complex tasks:
```
Planning Agent → Execution Agent → Review Agent → Output
```
Example: Code refactoring
1. Planning: Analyze code and create refactoring plan
2. Execution: Generate refactored code
3. Review: Check for errors and improvements
4. Output: Create pull request with changes
## Resources
- **Ollama Documentation**: https://ollama.ai/docs
- **Open WebUI Docs**: https://docs.openwebui.com
- **n8n Documentation**: https://docs.n8n.io
- **Qdrant Docs**: https://qdrant.tech/documentation
## Support
For issues or questions:
1. Check container logs first
2. Review this documentation
3. Search n8n community forums
4. Check Ollama Discord/GitHub issues
---
**Last Updated**: {{current_date}}
**Maintained By**: Homelab Admin
**Status**: Production

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View file

@ -0,0 +1,105 @@
---
title: Forgejo Audit Workflow
description: Weekly automated YAML compliance audit via n8n + Ollama
published: true
date: 2026-04-12T00:00:00.000Z
tags: gremlin, n8n, audit, forgejo
editor: markdown
dateCreated: 2026-04-12T00:00:00.000Z
---
# Forgejo Audit Workflow
**Status:** ✅ Live and confirmed working
Runs every Monday at 06:00. Walks all compose YAML files in `services/swarm/` and `services/swarm/stack/*/`, audits each one against the Swarm template standard using `qwen2.5-coder:7b`, and commits full reports to Forgejo + sends a summary to ntfy.
---
## What It Audits
Each file is checked for:
- Homepage labels on all services
- Uptime Kuma labels on all services
- Caddy labels on exposed services
- `node.platform.arch` exclusion constraints (ARM default)
- Volume paths follow `/DockerVol/` or `/data/nfs/znas/Docker/` convention
- No forbidden fields (`version:`, `container_name:`, `restart:`, `depends_on:`)
- `endpoint_mode: dnsrr` not used
- `diun.enable: "true"` present
- Network references `netgrimoire` external overlay
---
## Scope
~67 files total across `swarm/` (flat single-service YAMLs) and `swarm/stack/*/` (grouped stacks).
---
## Outputs
| Output | Where | Content |
|--------|-------|---------|
| ntfy notification | `gremlin-audits` topic | Short FAIL summary per file |
| Forgejo commit | `Netgrimoire/Audits/AUDIT-<name>-<date>.md` | Full audit report (POST new / PUT+SHA update) |
---
## n8n Architecture
```
Schedule Trigger (Mon 06:00)
→ Forgejo API: list all files in swarm/ and swarm/stack/*/
→ Loop Over Items (splitInBatches, batch=1)
→ Code node: fetch file content via Forgejo API
→ Code node: build Ollama prompt
→ Code node: POST to Ollama (qwen2.5-coder:7b)
→ Code node: parse result, build report markdown
→ Code node: commit report to Forgejo (POST or PUT+SHA)
→ Code node: send ntfy summary if FAIL
→ Loop feedback connection drives iteration
```
---
## Critical Patterns
All Forgejo and Ollama API calls use `this.helpers.httpRequest()` in Code nodes — **not** HTTP Request nodes. HTTP Request nodes hit body expression limits on large prompts.
Code nodes in "Run Once for Each Item" mode must return `{ json: ... }` not `[{ json: ... }]`.
Loop Over Items (splitInBatches, batch=1) + feedback connection from last node back to loop drives iteration over multiple files.
---
## Critical Environment Variables
| Variable | Value | Why |
|----------|-------|-----|
| `N8N_BLOCK_ENV_ACCESS_IN_NODE` | `false` | Allows env var access inside Code nodes |
| `N8N_RUNNERS_TASK_TIMEOUT` | `3600` | Prevents timeout on 67-file audit runs |
---
## Forgejo API Tokens
| Token | Scope |
|-------|-------|
| Read token | Fetch file content from `traveler/services` |
| Write token | Commit audit reports to `traveler/Netgrimoire` |
Tokens stored in n8n credentials, not in compose env vars.
---
## Forgejo Webhook Gotcha
If Forgejo webhooks fail to reach n8n, add to Forgejo `app.ini`:
```ini
[webhook]
ALLOWED_HOST_LIST = *
```
Required when `OFFLINE_MODE = true`. Restart Forgejo after edit.

View file

@ -0,0 +1,63 @@
---
title: Kuma Alert Triage Workflow
description: Uptime Kuma webhook → Ollama analysis → ntfy alert
published: true
date: 2026-04-12T00:00:00.000Z
tags: gremlin, n8n, kuma, alerts
editor: markdown
dateCreated: 2026-04-12T00:00:00.000Z
---
# Kuma Alert Triage Workflow
**Status:** ✅ Live and confirmed working
Triggered by Uptime Kuma webhook on service DOWN or RECOVERED events. DOWN events are analyzed by `llama3.2:3b` before alerting. RECOVERED events skip AI and send a simple notification.
---
## Webhook URL
```
https://n8n.netgrimoire.com/webhook/gremlin-kuma-alert
```
Configure in Uptime Kuma: Settings → Notifications → Webhook → apply to all monitors.
---
## Flow
```
Kuma Webhook
├── DOWN path:
│ → Parse payload (service name, URL, error)
│ → Ollama (llama3.2:3b): triage prompt
│ → ntfy gremlin-alerts (urgent priority) with AI analysis
└── RECOVERED path:
→ ntfy gremlin-alerts (normal priority, no AI call)
```
---
## Why Two Paths
AI triage is only useful for DOWN events — there's nothing to analyze on a recovery. Skipping Ollama on RECOVERED keeps notification latency near-instant for good news.
---
## ntfy Output Format
DOWN alert includes:
- Service name and URL
- Kuma error message
- Ollama's triage assessment (probable cause, suggested first step)
RECOVERED alert is a simple one-liner.
---
## Parked: Doc Generation Workflows
Two additional doc generation workflows were built but are currently inactive. CPU-only `llama3.2:3b` output barely exceeds reformatting the source compose file — not useful enough to commit. Will be revisited when GPU support is added to the Gremlin stack.

View file

@ -0,0 +1,522 @@
---
title: Caddy Reverse Proxy
description: Curreent and future config
published: true
date: 2026-02-25T01:50:20.558Z
tags:
editor: markdown
dateCreated: 2026-02-23T22:09:16.106Z
---
# Caddy Reverse Proxy
**Host:** znas (Docker Swarm node)
**Internal IP:** 192.168.5.10
**Data Path:** `/export/Docker/caddy/`
**Networks:** `netgrimoire` (service network), `vpn`
**Ports:** 80 (mapped to host 8900), 443
---
## Overview
Caddy serves as the primary reverse proxy for all public and internal web services. It uses the `caddy-docker-proxy` pattern, which allows services to register themselves with Caddy by adding Docker labels to their compose files — no manual Caddyfile edits required per service.
Configuration is **hybrid**: some services are defined entirely via Docker labels, others are defined statically in the Caddyfile, and most use both (labels for routing, Caddyfile for shared snippets). The `caddy-docker-proxy` container merges both sources at runtime.
---
## Current State
### Image
```yaml
image: lucaslorentz/caddy-docker-proxy:ci-alpine
```
This image provides the Docker Proxy module only. It has no CrowdSec, GeoIP, or rate limiting built in.
### Docker Compose (`/export/Docker/caddy/docker-compose.yml`)
```yaml
configs:
caddy-basic-content:
file: ./Caddyfile
labels:
caddy:
services:
caddy:
image: lucaslorentz/caddy-docker-proxy:ci-alpine
ports:
- 8900:80
- 443:443
environment:
- CADDY_INGRESS_NETWORKS=netgrimoire
networks:
- netgrimoire
- vpn
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /export/Docker/caddy/Caddyfile:/etc/caddy/Caddyfile
- /export/Docker/caddy:/data
#- /export/Docker/caddy/logs:/var/log/caddy # Placeholder for CrowdSec log mount
deploy:
placement:
constraints:
- node.hostname == znas
networks:
netgrimoire:
external: true
vpn:
external: true
```
### Caddyfile (`/export/Docker/caddy/Caddyfile`)
The Caddyfile defines shared authentication snippets and static site blocks. These snippets are available to all services — including label-defined ones — via `import`.
```caddyfile
# ─────────────────────────────────────────────────────────────────────────────
# AUTH SNIPPETS
# ─────────────────────────────────────────────────────────────────────────────
(authentik) {
route /outpost.goauthentik.io/* {
reverse_proxy http://authentik:9000
}
forward_auth http://authentik:9000 {
uri /outpost.goauthentik.io/auth/caddy
header_up X-Forwarded-URI {http.request.uri}
copy_headers X-Authentik-Username X-Authentik-Groups X-Authentik-Email \
X-Authentik-Name X-Authentik-Uid X-Authentik-Jwt \
X-Authentik-Meta-Jwks X-Authentik-Meta-Outpost X-Authentik-Meta-Provider \
X-Authentik-Meta-App X-Authentik-Meta-Version
}
}
(authelia) {
forward_auth http://authelia:9091 {
uri /api/verify?rd=https://login.wasted-bandwidth.net/
copy_headers Remote-User Remote-Groups Remote-Email Remote-Name
}
}
# ─────────────────────────────────────────────────────────────────────────────
# MAIL SNIPPETS
# ─────────────────────────────────────────────────────────────────────────────
(email-proxy) {
redir https://mail.netgrimoire.com/sogo 301
}
(mailcow-proxy) {
reverse_proxy nginx-mailcow:80
}
# ─────────────────────────────────────────────────────────────────────────────
# STATIC SITE BLOCKS — NETGRIMOIRE.COM
# ─────────────────────────────────────────────────────────────────────────────
cloud.netgrimoire.com {
reverse_proxy http://nextcloud-aio-apache:11000
}
log.netgrimoire.com {
reverse_proxy http://graylog:9000
}
win.netgrimoire.com {
reverse_proxy http://192.168.5.10:8006
}
docker.netgrimoire.com {
reverse_proxy http://portainer:9000
}
immich.netgrimoire.com {
reverse_proxy http://192.168.5.10:2283
}
npm.netgrimoire.com {
reverse_proxy http://librenms:8000
}
#jellyfin.netgrimoire.com {
# reverse_proxy http://jellyfin:8096
#}
# ─────────────────────────────────────────────────────────────────────────────
# AUTHENTICATED — NETGRIMOIRE.COM
# ─────────────────────────────────────────────────────────────────────────────
dozzle.netgrimoire.com {
import authentik
reverse_proxy http://192.168.4.72:8043
}
dns.netgrimoire.com {
import authentik
reverse_proxy http://192.168.5.7:5380
}
webtop.netgrimoire.com {
import authentik
reverse_proxy http://webtop:3000
}
jackett.netgrimoire.com {
import authentik
reverse_proxy http://gluetun:9117
}
transmission.netgrimoire.com {
import authentik
reverse_proxy http://gluetun:9091
}
scrutiny.netgrimoire.com {
import authentik
reverse_proxy http://192.168.5.10:8081
}
# ─────────────────────────────────────────────────────────────────────────────
# AUTHENTICATED — WASTED-BANDWIDTH.NET (Authelia)
# ─────────────────────────────────────────────────────────────────────────────
stash.wasted-bandwidth.net {
import authelia
reverse_proxy http://192.168.5.10:9999
}
namer.wasted-bandwidth.net {
import authelia
reverse_proxy http://192.168.5.10:6980
}
# ─────────────────────────────────────────────────────────────────────────────
# PUBLIC — PNCHARRIS.COM / WASTED-BANDWIDTH.NET
# ─────────────────────────────────────────────────────────────────────────────
fish.pncharris.com {
reverse_proxy http://web
}
www.wasted-bandwidth.net {
reverse_proxy http://web
}
# ─────────────────────────────────────────────────────────────────────────────
# MAILCOW — MULTI-DOMAIN
# ─────────────────────────────────────────────────────────────────────────────
mail.netgrimoire.com, autodiscover.netgrimoire.com, autoconfig.netgrimoire.com, \
mail.wasted-bandwidth.net, autodiscover.wasted-bandwidth.net, autoconfig.wasted-bandwidth.net, \
mail.gnarlypandaproductions.com, autodiscover.gnarlypandaproductions.com, autoconfig.gnarlypandaproductions.com, \
mail.pncfishandmore.com, autodiscover.pncfishandmore.com, autoconfig.pncfishandmore.com, \
mail.pncharrisenterprises.com, autodiscover.pncharrisenterprises.com, autoconfig.pncharrisenterprises.com, \
mail.pncharris.com, autodiscover.pncharris.com, autoconfig.pncharris.com, \
mail.florosafd.org, autodiscover.florosafd.org, autoconfig.florosafd.org {
import mailcow-proxy
}
```
### Docker Label Pattern (label-defined services)
Services not in the Caddyfile are registered via labels on their own containers. The snippet defined in the Caddyfile is available to them via `caddy.import`:
```yaml
labels:
- caddy=homepage.netgrimoire.com
- caddy.import=authentik
- caddy.reverse_proxy={{upstreams 3000}}
```
For services that need no auth:
```yaml
labels:
- caddy=myservice.netgrimoire.com
- caddy.reverse_proxy={{upstreams 8080}}
```
---
## Authentication Layers
Two identity proxies are in use, each serving different domains/use cases:
| Provider | Domain Pattern | Snippet |
|----------|----------------|---------|
| Authentik | `*.netgrimoire.com` internal tools | `import authentik` |
| Authelia | `*.wasted-bandwidth.net` | `import authelia` |
Services without an auth import are either public (e.g. `fish.pncharris.com`) or carry their own authentication (e.g. Nextcloud, Graylog, Portainer).
---
## Current Security Posture
CrowdSec protection exists only at the **OPNsense firewall level** — IP reputation blocking before traffic reaches Caddy. CrowdSec does not currently inspect HTTP traffic at the application layer. This means:
- Known-bad IPs are blocked at the perimeter
- Application-layer attacks (SQLi in URLs, malicious paths, bad user agents, brute force on specific endpoints) are not blocked at the Caddy level
- Services behind Authentik/Authelia have an additional protection layer; unauthenticated public services do not
---
## Future State: CrowdSec + GeoIP + Rate Limiting
### Target Image
```yaml
image: ghcr.io/serfriz/caddy-crowdsec-geoip-ratelimit-security-dockerproxy:latest
```
This is a drop-in replacement for `lucaslorentz/caddy-docker-proxy`. All existing Docker labels and Caddyfile site blocks continue to work unchanged. The image is automatically rebuilt monthly when Caddy releases updates — no custom image maintenance required.
**Included modules:**
- `caddy-docker-proxy` — same label-based config as current
- `caddy-crowdsec-bouncer` — inline HTTP blocking based on CrowdSec decisions
- `caddy-geoip` — GeoIP filtering at the application layer
- `caddy-ratelimit` — per-endpoint rate limiting
- `caddy-security` — additional auth/security middleware
### Updated Compose
```yaml
configs:
caddy-basic-content:
file: ./Caddyfile
labels:
caddy:
services:
caddy:
image: ghcr.io/serfriz/caddy-crowdsec-geoip-ratelimit-security-dockerproxy:latest
ports:
- 8900:80
- 443:443
environment:
- CADDY_INGRESS_NETWORKS=netgrimoire
- CADDY_DOCKER_EVENT_THROTTLE_INTERVAL=2000 # Prevents non-deterministic reload with CrowdSec module
- CROWDSEC_API_KEY=BYSLg/wKOa7wlHYzChJpBVJA06Ukc7G6fKJCvBwjyZg
networks:
- netgrimoire
- vpn
- crowdsec_net
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /export/Docker/caddy/Caddyfile:/etc/caddy/Caddyfile
- /export/Docker/caddy:/data
- caddy-logs:/var/log/caddy
deploy:
placement:
constraints:
- node.hostname == znas
crowdsec:
image: crowdsecurity/crowdsec
restart: unless-stopped
environment:
COLLECTIONS: "crowdsecurity/caddy crowdsecurity/http-cve crowdsecurity/whitelist-good-actors"
BOUNCER_KEY_CADDY: BYSLg/wKOa7wlHYzChJpBVJA06Ukc7G6fKJCvBwjyZg # Pre-registers the Caddy bouncer automatically
volumes:
- crowdsec-db:/var/lib/crowdsec/data
- ./crowdsec/acquis.yaml:/etc/crowdsec/acquis.yaml
- caddy-logs:/var/log/caddy:ro
networks:
- crowdsec_net
deploy:
placement:
constraints:
- node.hostname == znas
volumes:
caddy-logs:
crowdsec-db:
networks:
netgrimoire:
external: true
vpn:
external: true
crowdsec_net:
driver: overlay # Swarm overlay network
```
### CrowdSec Log Acquisition (`./crowdsec/acquis.yaml`)
```yaml
filenames:
- /var/log/caddy/access.log
labels:
type: caddy
```
### Environment File (`.env`)
```env
CROWDSEC_API_KEY=<generate-with-cscli-or-set-before-first-boot>
```
The `BOUNCER_KEY_CADDY` env var in the CrowdSec container pre-registers the bouncer key at startup. Set the same value in `.env` as `CROWDSEC_API_KEY` and both sides will be in sync on first boot — no need to run `cscli bouncers add` manually.
### Updated Caddyfile Additions
Add a global block at the top of the Caddyfile and a new `crowdsec` snippet. All other existing content remains unchanged.
```caddyfile
# ─────────────────────────────────────────────────────────────────────────────
# GLOBAL BLOCK — add this at the very top before any snippets
# ─────────────────────────────────────────────────────────────────────────────
{
crowdsec {
api_url http://crowdsec:8080
api_key {$CROWDSEC_API_KEY}
}
log {
output file /var/log/caddy/access.log {
roll_size 50mb
roll_keep 5
}
format json
}
}
# ─────────────────────────────────────────────────────────────────────────────
# CROWDSEC SNIPPET — add alongside existing auth snippets
# ─────────────────────────────────────────────────────────────────────────────
(crowdsec) {
route {
crowdsec
}
}
```
### Applying CrowdSec to Existing Services
Once the snippet exists, add `import crowdsec` to site blocks and container labels. This is a **gradual rollout** — services without it remain fully functional, just without Caddy-level CrowdSec inspection (they still have OPNsense perimeter protection).
**In the Caddyfile:**
```caddyfile
# Before
cloud.netgrimoire.com {
reverse_proxy http://nextcloud-aio-apache:11000
}
# After
cloud.netgrimoire.com {
import crowdsec
reverse_proxy http://nextcloud-aio-apache:11000
}
# With auth
dozzle.netgrimoire.com {
import crowdsec
import authentik
reverse_proxy http://192.168.4.72:8043
}
```
**In Docker labels:**
```yaml
labels:
- caddy=homepage.netgrimoire.com
- caddy.import=crowdsec
- caddy.import=authentik
- caddy.reverse_proxy={{upstreams 3000}}
```
### CrowdSec Rollout Priority
Roll out `import crowdsec` in this order based on risk exposure:
**High priority — do first (public, no auth):**
- `cloud.netgrimoire.com` (Nextcloud)
- `immich.netgrimoire.com`
- `docker.netgrimoire.com` (Portainer)
- `fish.pncharris.com`
- `www.wasted-bandwidth.net`
**Medium priority — high value behind auth:**
- `log.netgrimoire.com` (Graylog)
- `win.netgrimoire.com` (Proxmox)
- All `dozzle`, `dns`, `webtop`, `jackett`, `transmission`, `scrutiny`
**Lower priority — already protected by Authelia/Authentik:**
- `stash.wasted-bandwidth.net`
- `namer.wasted-bandwidth.net`
- All label-defined services behind auth
**Skip:**
- Mailcow block — handled by nginx-mailcow, different threat model
### Behavior if CrowdSec Container Goes Down
The bouncer is designed to **fail open** by default. If `crowdsec` is unreachable, Caddy continues serving traffic normally — enforcement is temporarily suspended but the site stays up. This is the safe default for a homelab. To change this behavior, set `enable_hard_fails true` in the global crowdsec block (will cause 500 errors if CrowdSec is down — not recommended for homelab).
---
## Bootstrap Steps
When ready to migrate to the new image:
**Step 1 — Add the CrowdSec global block and snippet to the Caddyfile** before changing the image. This ensures the Caddyfile is valid for the new image on startup.
**Step 2 — Create `./crowdsec/acquis.yaml`** with the content above.
**Step 3 — Create `.env`** with a strong random value for `CROWDSEC_API_KEY`:
```bash
openssl rand -hex 32
```
**Step 4 — Update the image and add the CrowdSec service to the compose file**, then redeploy:
```bash
docker stack deploy -c docker-compose.yml caddy
```
**Step 5 — Verify CrowdSec is reading Caddy logs:**
```bash
docker exec <crowdsec_container> cscli metrics
```
Look for the `Acquisition Metrics` table showing hits from `/var/log/caddy/access.log`.
**Step 6 — Test a ban manually:**
```bash
docker exec <crowdsec_container> cscli decisions add --ip 1.2.3.4 --duration 5m
# Verify the IP gets a 403 from Caddy
curl -I https://yoursite.com --resolve yoursite.com:443:1.2.3.4
docker exec <crowdsec_container> cscli decisions delete --ip 1.2.3.4
```
**Step 7 — Gradually add `import crowdsec`** to site blocks and labels per the priority order above.
---
## File Layout
```
/export/Docker/caddy/
├── Caddyfile # Shared snippets and static site blocks
├── docker-compose.yml # Caddy + CrowdSec services
├── .env # CROWDSEC_API_KEY (future)
├── data/ # Caddy data volume (TLS certs, etc.)
├── logs/ # caddy-logs volume mount point (future)
└── crowdsec/
└── acquis.yaml # Tells CrowdSec where to read Caddy logs (future)
```
---
## Known Issues / Notes
- Port 80 is mapped to host port 8900 — this is intentional for Swarm. OPNsense NAT handles the external 80→8900 translation.
- The `CADDY_DOCKER_EVENT_THROTTLE_INTERVAL=2000` setting is **required** with the CrowdSec module to prevent non-deterministic domain matching behavior during container label reloads (see [issue #61](https://github.com/hslatman/caddy-crowdsec-bouncer/issues/61)).
- Jellyfin is commented out in the Caddyfile — likely served via a different path or disabled temporarily.
- The `web` upstream referenced by `fish.pncharris.com` and `www.wasted-bandwidth.net` resolves to a container named `web` on the `netgrimoire` network.
- Authelia redirect URL is `https://login.wasted-bandwidth.net/` — update if this changes.
- The serfriz image is rebuilt on the **1st of each month** for module updates, and on every new Caddy release. Force a module update by recreating the container: `docker service update --force caddy_caddy`.

View file

@ -0,0 +1,144 @@
---
title: Docker Swarm Template Standard
description: Canonical YAML template and label rules for all Netgrimoire swarm services
published: true
date: 2026-04-12T00:00:00.000Z
tags: keystone, docker, swarm
editor: markdown
dateCreated: 2026-04-12T00:00:00.000Z
---
# Docker Swarm Template Standard
All Swarm YAML files in `services/swarm/` and `services/swarm/stack/` must follow this standard. The Gremlin audit workflow checks compliance weekly.
---
## Canonical Template
```yaml
# Deploy: docker stack deploy -c <service>.yaml <service>
services:
<servicename>:
image: <image>:latest
environment:
TZ: America/Chicago
volumes:
- /DockerVol/<servicename>:/config
# - /data/nfs/znas/Docker/<servicename>:/data
networks:
- netgrimoire
deploy:
restart_policy:
condition: any
delay: 5s
max_attempts: 3
window: 120s
placement:
constraints:
- node.hostname == znas
- node.platform.arch != aarch64
- node.platform.arch != arm
labels:
# Caddy
caddy: <servicename>.netgrimoire.com
caddy.reverse_proxy: <servicename>:<PORT>
caddy.import: crowdsec
caddy.import_1: authentik
# Uptime Kuma
kuma.<servicename>.http.name: <Service Name>
kuma.<servicename>.http.url: https://<servicename>.netgrimoire.com
# Homepage
homepage.group: <Group>
homepage.name: <Service Name>
homepage.icon: <service>.png
homepage.href: https://<servicename>.netgrimoire.com
homepage.description: <Description>
# DIUN
diun.enable: "true"
networks:
netgrimoire:
external: true
```
---
## Forbidden Fields
Never use these at the service level:
| Field | Reason |
|-------|--------|
| `version:` | Deprecated in Compose v2+ |
| `container_name:` | Incompatible with Swarm replicas |
| `restart:` | Use `deploy.restart_policy` instead |
| `depends_on:` | Not supported in Swarm mode |
| `endpoint_mode: dnsrr` | Breaks internal DNS — always use VIP |
---
## Volume Path Rules
| Path | When to Use |
|------|-------------|
| `/DockerVol/<service>` | Config, SQLite DBs, small app state. **Only valid with a `node.hostname` placement constraint.** |
| `/data/nfs/znas/Docker/<service>` | Bulk data, media, or any service without a hostname constraint |
---
## Placement Constraints
**Default (all services):**
```yaml
constraints:
- node.hostname == znas
- node.platform.arch != aarch64
- node.platform.arch != arm
```
ARM exclusion prevents accidental scheduling on Pi vault/worker nodes. Override only if the service is ARM-specific.
For services pinned to docker4 (Gremlin stack):
```yaml
constraints:
- node.hostname == docker4
- node.platform.arch != aarch64
- node.platform.arch != arm
```
---
## Caddy Label Rules
```yaml
caddy: servicename.netgrimoire.com # no https:// prefix
caddy.reverse_proxy: servicename:PORT # container name:port, NOT {{upstreams PORT}}
caddy.import: crowdsec # always both
caddy.import_1: authentik # always both, no exceptions
```
Never use `{{upstreams PORT}}` — it breaks during `docker stack config` preprocessing.
**Wasted-bandwidth services** use `wasted-bandwidth.net` domain and `caddy.import_1: authelia` instead of authentik.
---
## Deploy Workflow
```bash
# From services repo root
git add . && git commit -m "Add/update <service>" && git push
# On znas (or docker4 for Gremlin services)
cd ~/services && git pull
cd swarm/stack/<StackName>
set -a && source .env && set +a
docker stack config --compose-file <service>.yaml > resolved.yml
docker stack deploy --compose-file resolved.yml <service>
rm resolved.yml
docker stack services <service>
```

View file

@ -0,0 +1,59 @@
---
title: Host Inventory
description: All Netgrimoire nodes — roles, IPs, services, hardware
published: true
date: 2026-04-12T00:00:00.000Z
tags: keystone, hosts
editor: markdown
dateCreated: 2026-04-12T00:00:00.000Z
---
# Host Inventory
## Swarm Cluster
| Host | Hostname | IP | Role | Runtime |
|------|----------|----|------|---------|
| znas | znas | 192.168.5.10 | NAS + Primary Swarm manager | Swarm manager + Compose |
| docker2 | — | — | VPN gateway | Compose only |
| docker3 | — | — | LibreNMS | Compose only |
| docker4 | hermes | 192.168.5.16 | Mail server + AI worker | Compose + Swarm worker |
| docker5 | — | 192.168.5.18 | Media host | Compose only |
| Pi nodes | various | various | Swarm workers + vault nodes | Swarm workers |
## Other Infrastructure
| Device | IP | Purpose |
|--------|----|---------|
| OPNsense firewall | 192.168.3.4 | Firewall, dual-WAN, NAT, WireGuard |
| Internal DNS | 192.168.5.7 | Technitium DNS |
| ISPConfig | 192.168.4.11 | Web/DNS hosting control panel |
## WAN
| Interface | IP | Status | Purpose |
|-----------|----|----|---------|
| ATT (`igc1`) | 107.133.34.145/28 | Primary | 5 static IPs allocated |
| Cox | — | Retiring | Legacy WAN |
**ATT Static IP Assignments:**
| IP | Assigned To |
|----|-------------|
| .145 | Admin / default |
| .146 | Web services |
| .147 | Jellyfin |
| .148 | Mail (ATT_Mail — pending) |
| .149 | WireGuard / Spare |
## Pinned Services by Host
**znas** — Caddy, Forgejo, Wiki.js, Homepage, Uptime Kuma, AutoKuma, ntfy, Portainer, Authentik, LLDAP, Kopia, Vault, Nextcloud AIO, Immich, Joplin, n8n (Gremlin), all arr services, all media services
**docker4 (hermes)** — MailCow (Compose), Ollama, Open WebUI, Qdrant (Swarm, pinned docker4), Roundcube
**docker5** — Jellyfin, Jellyfinx (Compose)
**docker2** — Gluetun, Jackett, Transmission (Compose)
**docker3** — LibreNMS (Compose)

View file

@ -0,0 +1,401 @@
---
title: Sample Domain Setup
description: Graymutt@nucking-futz.com
published: true
date: 2026-03-16T00:34:08.387Z
tags:
editor: markdown
dateCreated: 2026-02-25T22:02:27.719Z
---
# Mail Setup — nucking-futz.com
## Part 0 — OPNsense: Configure ATT_Mail Secondary IP
Before configuring DNS or Mailcow, the secondary AT&T static IP must be configured in OPNsense as a virtual IP on the WAN interface and NAT rules must be set so only raw SMTP traffic (ports 25, 465, 587, 993, 143) uses this address. Webmail, the Mailcow admin UI, and all other traffic continue to use the primary WAN IP (107.133.34.145).
| Address | Purpose |
|---------|---------|
| 107.133.34.145 | Primary WAN — web, admin, everything else |
| 107.133.34.146 | ATT_Mail — SMTP/IMAP inbound and outbound only |
### Step 0.1 — Add Virtual IP
1. Go to **Interfaces → Virtual IPs → Settings**
2. Click **+ Add**
3. Set the following:
| Field | Value |
|-------|-------|
| Mode | IP Alias |
| Interface | WAN (igc1) |
| Network / Address | `107.133.34.146 / 28` |
| Description | `ATT_Mail` |
4. Click **Save**, then **Apply changes**
> The /28 subnet mask matches the AT&T block (107.133.34.144/28). All 5 static IPs in the block share this mask.
### Step 0.2 — Outbound NAT for SMTP Traffic
This ensures Mailcow's outbound SMTP connections leave through the ATT_Mail IP rather than the primary WAN IP. OPNsense must be in **Hybrid** or **Manual** outbound NAT mode.
1. Go to **Firewall → NAT → Outbound**
2. Confirm mode is set to **Hybrid Outbound NAT** (or Manual — either works)
3. Click **Add** to create a new rule
**Rule for outbound SMTP (port 587 relay to MXRoute):**
| Field | Value |
|-------|-------|
| Interface | WAN |
| TCP/IP Version | IPv4 |
| Protocol | TCP |
| Source | `192.168.5.16 / 32` (Mailcow host) |
| Source Port | any |
| Destination | any |
| Destination Port | 587 |
| Translation / Target | `107.133.34.146` (ATT_Mail) |
| Description | `Mailcow outbound relay via ATT_Mail` |
4. Repeat for port **25** (direct outbound SMTP, if used) and port **465** (SMTPS)
5. Click **Save** and **Apply changes**
### Step 0.3 — Inbound NAT (Port Forwards) for Mail Ports
Route inbound connections on mail ports to Mailcow using the ATT_Mail IP as the external address.
1. Go to **Firewall → NAT → Port Forward**
2. Create rules for each mail port:
| External IP | Port(s) | Forward to | Description |
|-------------|---------|-----------|-------------|
| 107.133.34.146 | 25 | 192.168.5.16:25 | SMTP inbound |
| 107.133.34.146 | 465 | 192.168.5.16:465 | SMTPS inbound |
| 107.133.34.146 | 587 | 192.168.5.16:587 | Submission inbound |
| 107.133.34.146 | 993 | 192.168.5.16:993 | IMAPS |
| 107.133.34.146 | 143 | 192.168.5.16:143 | IMAP (if needed) |
> **Do not** add port forwards for 80, 443, or 3443 (Mailcow admin/webmail ports) on this IP. Those remain on the primary WAN IP via Caddy.
3. Click **Save** and **Apply changes**
### Step 0.4 — Firewall Rules
Ensure the WAN firewall rules permit inbound traffic on the mail ports to the ATT_Mail IP. If you have a default deny-all WAN rule (recommended), add explicit pass rules:
1. Go to **Firewall → Rules → WAN**
2. Add pass rules for each port in the table above with destination `107.133.34.146`
### Step 0.5 — Verify
```bash
# From outside your network, confirm the mail IP is live
telnet 107.133.34.146 25
# Should see: 220 hermes.netgrimoire.com ESMTP
# Confirm primary WAN IP does NOT respond on port 25
telnet 107.133.34.145 25
# Should time out or be refused
# Check that Mailcow outbound connections leave from the ATT_Mail IP
# Send a test to check-auth@verifier.port25.com and inspect the Return-Path
# or check the Received: header — the sending IP should be 107.133.34.146
```
> ⚠ If the verify step shows port 25 still responding on 107.133.34.145, check that no leftover port forward rules exist on the primary WAN IP for mail ports.
---
## Overview
This guide covers complete mail setup for `nucking-futz.com` using MXRoute as the inbound gateway and Mailcow as the mailbox host. MXRoute receives all inbound mail from the internet (solving residential IP filtering issues with banks and financial institutions) and forwards to Mailcow for storage and retrieval. Mailcow handles outbound mail via the MXRoute SMTP relay.
**Architecture:**
```
Inbound: Internet → MXRoute (commercial IP) → Mailcow (192.168.5.16)
Outbound: Mailcow → MXRoute SMTP relay → Internet
```
**Why two domains in Mailcow:**
MXRoute forwarders require a valid destination email address. You cannot forward `graymutt@nucking-futz.com` back to `graymutt@nucking-futz.com` — that loops. The solution is to have Mailcow own a subdomain (`mail.nucking-futz.com`) with its own MX record pointing directly to your server. MXRoute forwards to `graymutt@mail.nucking-futz.com`, Mailcow delivers locally, and an alias domain maps `nucking-futz.com` back so users only ever see and use `graymutt@nucking-futz.com`.
---
## Prerequisites
- MXRoute account active with DirectAdmin access
- Mailcow running at 192.168.5.16
- DNS management access for nucking-futz.com
- Your MXRoute server hostname from your MXRoute welcome email (e.g. `arrow.mxrouting.net`)
---
## Step 1 — DNS Records
Create all DNS records before configuring either service. Keep TTL at 300 during setup — raise to 3600 once confirmed working.
![image.png](/image.png)
![arec.png](/email/arec.png)
![txt.png](/email/txt.png)
### Required DNS Records
| Type | Host | Value | Notes |
|------|------|-------|-------|
| A | `mail` | `YOUR_ATT_MAIL_IP` | Points to Mailcow — MXRoute forwards to this server |
| MX | `@` | `heracles.mxrouting.net (Priority 10)` | Check MXRoute welcome email for exact hostname |
| MX | `@` | `heracles-relay.mxrouting.net (Priority 20)` (priority 20) | Secondary MXRoute server from welcome email |
| MX | `mail` | `mail.nucking-futz.com` (priority 10) | Mailcow handles this subdomain directly |
| CNAME | `imap` | `mail.nucking-futz.com` | Client autoconfiguration |
| CNAME | `smtp` | `mail.nucking-futz.com` | Client autoconfiguration |
| CNAME | `webmail` | `mail.nucking-futz.com` | Roundcube access |
| CNAME | `autodiscover` | `mail.nucking-futz.com` | Outlook autodiscover |
| CNAME | `autoconfig` | `mail.nucking-futz.com` | Thunderbird autoconfig |
| TXT | `@` | `v=spf1 ip4:YOUR_ATT_MAIL_IP include:mxroute.com -all` | SPF — authorizes both Mailcow direct and MXRoute relay |
| TXT | `mail` | `v=spf1 ip4:YOUR_ATT_MAIL_IP -all` | SPF for subdomain — Mailcow sends directly from here |
| TXT | `_dmarc` | `v=DMARC1; p=reject; rua=mailto:admin@netgrimoire.com` | DMARC enforcement |
> DKIM TXT records (two selectors) are added in Steps 2 and 3 after generating keys in Mailcow and MXRoute.
---
## Step 2 — Mailcow Configuration
### 2.1 Add the Subdomain as Primary Domain
Mailcow owns `mail.nucking-futz.com` as its active mail domain. Mailboxes live internally on this subdomain.
1. Log into Mailcow admin UI → **Mail Setup → Domains**
2. Click **Add domain**
3. Set **Domain:** `mail.nucking-futz.com`
4. Leave all other settings as default
5. Click **Add domain**
### 2.2 Add the Alias Domain
This makes Mailcow accept mail addressed to `@nucking-futz.com` and deliver it to the matching `@mail.nucking-futz.com` mailbox. Users send and receive as `@nucking-futz.com` — the subdomain is invisible to them.
1. Go to **Mail Setup → Alias Domains**
2. Click **Add alias domain**
3. Set **Alias Domain:** `nucking-futz.com`
4. Set **Target Domain:** `mail.nucking-futz.com`
5. Click **Add**
### 2.3 Create Mailbox
1. Go to **Mail Setup → Mailboxes**
2. Click **Add mailbox**
3. Set **Username:** `graymutt`
4. Set **Domain:** `mail.nucking-futz.com`
5. Set a strong password
6. Set quota as needed
7. Click **Add**
The mailbox is internally `graymutt@mail.nucking-futz.com`. The alias domain from Step 2.2 means Mailcow also accepts and delivers mail for `graymutt@nucking-futz.com` to this same mailbox.
### 2.4 Generate DKIM Key
1. Go to **Configuration → Configuration & Diagnostics → Configuration**
2. Click **ARC/DKIM Keys** tab
3. Select domain `mail.nucking-futz.com`
4. Set **Selector:** `mailcow`
5. Set **Key length:** 2048
6. Click **Generate**
7. Copy the full TXT record value — needed for DNS
### 2.5 Add Mailcow DKIM DNS Record
| Type | Host | Value |
|------|------|-------|
| TXT | `mailcow._domainkey.mail` | *(full key string from Mailcow — begins with `v=DKIM1;`)* |
### 2.6 Add MXRoute to Trusted Networks
Prevents Mailcow from applying spam scoring to forwarded mail arriving from MXRoute's IPs.
1. Go to **Configuration → Configuration & Diagnostics → Configuration**
2. Click **Extra Postfix configuration** tab
3. Add to `extra.cf`:
```
# Trust MXRoute forwarding IPs
mynetworks = 127.0.0.1/8 [::1]/128 192.168.5.0/24 69.167.160.0/19 198.54.120.0/22
```
> Verify current MXRoute IP ranges in your MXRoute account documentation — these may change.
4. Click **Save**
5. Click **Restart affected containers**
### 2.7 Configure Outbound Relay
Routes outbound mail through MXRoute for best deliverability.
1. Go to **Configuration → Routing → Sender-Dependent Transports**
2. Click **Add transport**
3. Set **Domain:** `nucking-futz.com`
4. Set **Relay host:** `[smtp.mxroute.com]:587` (confirm SMTP hostname from MXRoute welcome email)
5. Set **Username:** your MXRoute relay username
6. Set **Password:** your MXRoute relay password
7. Click **Add**
8. Repeat for domain `mail.nucking-futz.com` using the same relay credentials
---
## Step 3 — MXRoute Configuration
### 3.1 Add Domain in DirectAdmin
1. Log into MXRoute DirectAdmin
2. Go to **Account Manager → Domain Setup**
3. Add domain: `nucking-futz.com`
4. Complete the domain wizard
### 3.2 Create Forwarder
MXRoute does not support domain-level remote MX routing — forwarders must be created per address. The destination must be on a domain whose MX resolves to Mailcow, not back to MXRoute.
1. Go to **Forwarders** in the MXRoute control panel
2. Click **Create New Forwarder**
3. Set **Forwarder Name:** `graymutt` (the `@nucking-futz.com` part is shown automatically)
4. Set **Destination Type:** `Forward to Email(s)`
5. Set **Recipients:** `graymutt@mail.nucking-futz.com`
6. Click **Create Forwarder**
> Every new mailbox requires a matching forwarder entry. The pattern is always `user@nucking-futz.com``user@mail.nucking-futz.com`. See the Adding a New Mailbox section below.
### 3.3 Get MXRoute DKIM Key
1. Go to **Email Manager → DKIM Keys** for `nucking-futz.com`
2. Generate or view the DKIM key — note the selector name assigned (often `x`)
3. Copy the full TXT record value
### 3.4 Add MXRoute DKIM DNS Record
| Type | Host | Value |
|------|------|-------|
| TXT | `x._domainkey` *(replace `x` with MXRoute's actual selector)* | *(full key string from MXRoute DirectAdmin)* |
---
## Step 4 — Verify DNS
Once DNS has propagated, verify all records:
```bash
# MX for main domain — should show MXRoute servers
dig MX nucking-futz.com +short
# MX for subdomain — should show mail.nucking-futz.com
dig MX mail.nucking-futz.com +short
# A record — should show your ATT IP
dig A mail.nucking-futz.com +short
# SPF
dig TXT nucking-futz.com +short
dig TXT mail.nucking-futz.com +short
# DMARC
dig TXT _dmarc.nucking-futz.com +short
# DKIM — Mailcow
dig TXT mailcow._domainkey.mail.nucking-futz.com +short
# DKIM — MXRoute (replace x with your selector)
dig TXT x._domainkey.nucking-futz.com +short
```
Run a full check at [https://mxtoolbox.com](https://mxtoolbox.com) → Email Health for `nucking-futz.com`.
---
## Step 5 — Test Mail Flow
### Inbound Test
Send a test email to `graymutt@nucking-futz.com` from an external Gmail or Outlook account. Verify:
- Mail arrives in the Mailcow mailbox
- Headers show the MXRoute → Mailcow forwarding path (two `Received:` hops)
- No spam flagging
In Roundcube open the test message → **More → View Source** and check the `Received:` chain.
### Outbound Test
Send from `graymutt@nucking-futz.com` to an external Gmail address. Run through [https://mail-tester.com](https://mail-tester.com) for a full delivery score.
### DKIM/SPF/DMARC Test
Send a test to `check-auth@verifier.port25.com` — you will receive an automated reply confirming pass/fail for SPF, DKIM, and DMARC.
### Bank/Financial Test
Send from a bank address to `graymutt@nucking-futz.com` and confirm delivery. This is the primary goal — banks see MXRoute's commercial IPs in the MX record, not your residential AT&T IP.
---
## Email Client Settings
| Setting | Value |
|---------|-------|
| Email address | `graymutt@nucking-futz.com` |
| IMAP server | `mail.nucking-futz.com` |
| IMAP port | `993` (SSL/TLS) |
| SMTP server | `mail.nucking-futz.com` |
| SMTP port | `465` (SSL/TLS) |
| Username | `graymutt@nucking-futz.com` |
| Password | *(mailbox password set in Step 2.3)* |
> Users log in and send as `graymutt@nucking-futz.com`. Mailcow resolves this to the internal `mail.nucking-futz.com` mailbox transparently via the alias domain.
---
## Adding a New Mailbox
Every new address on `nucking-futz.com` requires entries in both Mailcow and MXRoute.
**In Mailcow:**
1. Mail Setup → Mailboxes → Add mailbox
2. Username: `newuser`, Domain: `mail.nucking-futz.com`
**In MXRoute control panel:**
1. Forwarders → Create New Forwarder
2. Forwarder Name: `newuser`, Destination Type: `Forward to Email(s)`, Recipients: `newuser@mail.nucking-futz.com`
---
## Credentials Reference
| Service | Account | Password |
|---------|---------|----------|
| Mailcow mailbox | `graymutt@mail.nucking-futz.com` | *(set during mailbox creation)* |
| MXRoute relay | *(from MXRoute welcome email)* | *(from MXRoute welcome email)* |
| MXRoute DirectAdmin | *(from MXRoute welcome email)* | *(from MXRoute welcome email)* |
---
## Known Gotchas
**Forwarder destination must not loop.** Never set the MXRoute forwarder destination to an address on the same domain that has MXRoute as its MX. `graymutt@nucking-futz.com``graymutt@nucking-futz.com` will loop. Always forward to `@mail.nucking-futz.com` which has its own MX resolving directly to Mailcow.
**Two DKIM selectors required.** `mailcow._domainkey.mail.nucking-futz.com` covers mail Mailcow sends directly from the subdomain. `x._domainkey.nucking-futz.com` (MXRoute selector) covers outbound mail relayed through MXRoute. Both must exist for DMARC to pass on all paths.
**New mailboxes need matching MXRoute forwarders.** MXRoute has no catch-all forwarding to remote servers. Every address that needs to receive mail must have an explicit forwarder in DirectAdmin. Add the MXRoute forwarder step to your mailbox creation checklist.
**Alias domain vs. alias mailbox.** The alias domain in Step 2.2 maps the entire `nucking-futz.com` domain to `mail.nucking-futz.com`. Do not also create individual alias mailboxes for the same addresses — this creates duplicate delivery and may cause unexpected behavior.
**SPF differs between the two domains.** The main domain SPF includes `include:mxroute.com` because MXRoute relay sends outbound from there. The subdomain SPF (`mail.nucking-futz.com`) only needs your ATT IP — Mailcow sends directly from that domain without going through MXRoute. Two different records for two different send paths.
---
## Related Documentation
- [MailCow Configuration](./mailcow)
- [MXRoute Outbound Relay Setup](./mxroute-outbound-relay)
- [OPNsense Firewall](./opnsense-firewall) — static IP allocation for ATT_Mail

View file

@ -0,0 +1,391 @@
---
title: MailCow Hardening
description: Securing Mailcow
published: true
date: 2026-02-23T21:56:32.211Z
tags:
editor: markdown
dateCreated: 2026-02-23T21:56:22.997Z
---
# MailCow Security Hardening
**Service:** MailCow Dockerized
**Host:** 192.168.5.16 (MailCow_Ngnx alias)
**Relay:** MXRoute (outbound only)
**Last Reviewed:** February 2026
---
## Overview
Running MailCow with MXRoute as an outbound relay creates a specific threat model that's different from either a fully self-hosted or fully managed setup. Your server receives inbound directly (MX points to your IP), stores all mailboxes locally, and hands outbound to MXRoute. This means you carry the risk surface of both — inbound SMTP exposure plus the credential and reputation exposure of a relay relationship.
The security areas that matter most for this setup:
| Area | Risk | Priority |
|---|---|---|
| DNS authentication (SPF/DKIM/DMARC) | Spoofing, deliverability failure, relay abuse | 🔴 Critical |
| MTA-STS + TLS-RPT | SMTP downgrade attacks on inbound | 🔴 Critical |
| MXRoute relay credential security | Relay hijacking, spam abuse on your reputation | 🔴 Critical |
| Mailcow admin hardening | Account takeover, open relay creation | 🔴 Critical |
| Postfix TLS hardening | Weak cipher negotiation | 🟡 High |
| Nginx header hardening | XSS, clickjacking on webmail | 🟡 High |
| Rspamd tuning | Inbound spam, outbound policy enforcement | 🟡 High |
| DMARC reporting | Visibility into spoofing and misdelivery | 🟡 High |
| ClamAV / attachment scanning | Malware distribution via your domain | 🟢 Medium |
| Rate limiting | Compromised account spam runs | 🟢 Medium |
---
## DNS Authentication
This is the foundation. If any of these are misconfigured your mail either doesn't deliver or your domain gets spoofed. With MXRoute in the mix the SPF record requires special attention.
### SPF — Include Both Sources
Your SPF must authorize **both** your own IP (for any direct sends) and MXRoute's sending infrastructure:
```dns
@ IN TXT "v=spf1 ip4:YOUR_ATT_MAIL_IP include:mxroute.com ~all"
```
Replace `YOUR_ATT_MAIL_IP` with the static IP you've dedicated to mail (ATT_Mail virtual IP). The `include:mxroute.com` covers MXRoute's sending servers.
> ⚠ Do not use `-all` (hard fail) until you have confirmed all your sending sources are covered. Use `~all` (softfail) initially, then tighten after verifying DMARC reports show no legitimate sources failing.
> ⚠ SPF has a **10 DNS lookup limit**. Each `include:` costs lookups. If you add more includes (e.g. transactional services), check your SPF lookup count at [mxtoolbox.com/spf](https://mxtoolbox.com/spf.aspx).
### DKIM — Two Selectors for Two Signers
Because MXRoute re-signs outbound mail with their own DKIM key, you need a DKIM record for both signers:
| Selector | Signer | Where to get the key |
|---|---|---|
| `mailcow._domainkey` | MailCow (inbound, internal sends) | MailCow UI → Configuration → ARC/DKIM Keys |
| `mxroute._domainkey` (or `x._domainkey`) | MXRoute (outbound relay) | MXRoute control panel |
Add both as TXT records. Having both means DMARC passes regardless of which path the mail took.
> ✓ MailCow lets you choose the DKIM selector name. Use `mailcow` as the selector to avoid confusion with the MXRoute selector.
### DMARC — Start Monitoring, Then Enforce
DMARC ties SPF and DKIM together and tells receiving servers what to do with failures. Start in monitoring mode, review reports for 24 weeks, then advance to enforcement.
**Phase 1 — Monitor (add immediately):**
```dns
_dmarc IN TXT "v=DMARC1; p=none; rua=mailto:dmarc-reports@yourdomain.com; ruf=mailto:dmarc-failures@yourdomain.com; fo=1"
```
**Phase 2 — Quarantine (after reviewing reports, no legitimate failures):**
```dns
_dmarc IN TXT "v=DMARC1; p=quarantine; pct=100; rua=mailto:dmarc-reports@yourdomain.com; fo=1"
```
**Phase 3 — Reject (final enforcement):**
```dns
_dmarc IN TXT "v=DMARC1; p=reject; pct=100; rua=mailto:dmarc-reports@yourdomain.com; fo=1"
```
> ✓ `fo=1` requests forensic reports on any authentication failure — more detail for debugging.
**DMARC Report Processing:** Raw DMARC reports are XML and not human-readable. Use one of these free tools to process them:
- [Postmark DMARC](https://dmarc.postmarkapp.com/) — free, email-based weekly digest
- [dmarcian.com](https://dmarcian.com) — free tier, dashboard view
- Self-hosted: [Parsedmarc](https://github.com/domainaware/parsedmarc) → send to Graylog/Grafana
---
## MTA-STS (MailCow September 2025+)
MTA-STS forces other mail servers to use TLS when delivering to you, preventing downgrade attacks that try to force plaintext SMTP. The September 2025 MailCow update added the `postfix-tlspol-mailcow` container which enforces MTA-STS on **outbound** connections too.
### What You Need
**1. DNS records** — three records for each domain:
```dns
# For your mail server's hostname domain (e.g. netgrimoire.com)
mta-sts IN CNAME mail.netgrimoire.com.
_mta-sts IN TXT "v=STSv1; id=20260223"
_smtp._tls IN TXT "v=TLSRPTv1; rua=mailto:tls-reports@netgrimoire.com"
```
The `id` value in `_mta-sts` is a version string — update it (e.g. to today's date) whenever you change your MTA-STS policy.
**2. Policy file** — served by MailCow's nginx at `https://mta-sts.yourdomain.com/.well-known/mta-sts.txt`:
```bash
# On your MailCow host:
mkdir -p /opt/mailcow-dockerized/data/web/.well-known/
cat > /opt/mailcow-dockerized/data/web/.well-known/mta-sts.txt << 'EOF'
version: STSv1
mode: enforce
max_age: 86400
mx: mail.netgrimoire.com
EOF
```
Start with `mode: testing` for the first week, then switch to `mode: enforce`.
**3. For additional domains** — add CNAMEs pointing to your primary domain's records:
```dns
# For each additional mail domain you host on MailCow:
mta-sts.otherdomain.com IN CNAME mail.netgrimoire.com.
_mta-sts.otherdomain.com IN CNAME _mta-sts.netgrimoire.com.
_smtp._tls.otherdomain.com IN CNAME _smtp._tls.netgrimoire.com.
```
> ✓ TLS-RPT (`_smtp._tls` TXT record) sends you reports about TLS failures when other servers connect to you. Pipe these to Graylog or Postmark for visibility.
---
## MXRoute Relay Security
This is the most overlooked area. Your MXRoute credentials can send mail as your domain — if they're compromised, someone else is spamming from your reputation.
### Credential Hardening
- Use a **unique, strong password** for your MXRoute account — not shared with anything else
- Store the MXRoute SMTP credentials in MailCow's relay configuration only, not in any config file or environment variable that gets committed to git
- If MXRoute supports API tokens or app passwords, use those instead of your main account password
### Relay Configuration in MailCow
In MailCow UI: **Configuration → Routing → Sender-Dependent Transports**
Verify the relay is configured to authenticate via TLS (port 587 with STARTTLS or port 465 with SSL). Do not relay over port 25 without authentication.
```
# What the relay entry should look like in Postfix terms:
# relayhost = [smtp.mxroute.com]:587
# smtp_sasl_auth_enable = yes
# smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd
# smtp_tls_security_level = encrypt ← ensures TLS is required, not optional
```
> ⚠ Set `smtp_tls_security_level = encrypt` (not `may`) so the connection to MXRoute is always encrypted. If the TLS negotiation fails, Postfix should reject rather than fall back to plaintext.
### Rate Limiting (Prevent Relay Abuse if Account Compromised)
Add rate limits in MailCow UI: **Configuration → Mail Setup → Domains → [your domain] → Rate Limit**
| Setting | Recommended Value | Notes |
|---|---|---|
| Outbound messages/hour | 500 | Adjust for your actual sending volume |
| Outbound messages/day | 2000 | A sudden spike above this = red flag |
This doesn't stop abuse but limits blast radius if a mailbox is compromised and starts spamming through MXRoute.
---
## MailCow Admin Hardening
### Two-Factor Authentication
Enable 2FA on the admin account and all mailbox accounts that have access to the admin panel.
MailCow UI: **Edit mailbox → Two-Factor Authentication → TOTP**
> ⚠ There was a session fixation vulnerability in the MailCow web panel (GHSA-23c8-4wwr-g3c6, January 2025) and a critical SSTI vulnerability (GHSA-8p7g-6cjj-wr9m, July 2025). Both require staying current on updates. Enable auto-updates or check the MailCow blog monthly.
### Restrict Admin UI to Internal Network
The MailCow admin panel should not be reachable from the public internet. Access should require being on your internal network or connected via WireGuard.
In OPNsense, add a firewall rule blocking external access to port 443 on 192.168.5.16 except from your static admin IP or WireGuard peers.
Alternatively, configure MailCow's nginx to restrict the admin path by IP:
```nginx
# In data/conf/nginx/includes/site-defaults.conf
# Add inside the server block for the admin panel:
location /admin {
allow 192.168.3.0/24;
allow 192.168.5.0/24;
allow 192.168.32.0/24; # WireGuard peers
deny all;
}
```
### API Key Rotation
If you use the MailCow API (for automation or Netgrimoire tooling), generate a dedicated read-only key where possible, and rotate keys annually or after any suspected compromise.
---
## Postfix TLS Hardening
Add to `/opt/mailcow-dockerized/data/conf/postfix/extra.cf`:
```ini
# Enforce TLS 1.2+ and strong ciphers
tls_high_cipherlist = ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256
tls_preempt_cipherlist = yes
# Inbound SMTP (smtpd) — receiving from other mail servers
smtpd_tls_protocols = !SSLv2, !SSLv3, !TLSv1, !TLSv1.1
smtpd_tls_ciphers = high
smtpd_tls_mandatory_ciphers = high
# Outbound SMTP (smtp) — delivery to MXRoute and direct sends
smtp_tls_protocols = !SSLv2, !SSLv3, !TLSv1, !TLSv1.1
smtp_tls_ciphers = high
smtp_tls_mandatory_ciphers = high
# Require encryption on the MXRoute relay connection
smtp_tls_security_level = encrypt
```
After editing, restart Postfix:
```bash
cd /opt/mailcow-dockerized
docker compose restart postfix-mailcow
```
---
## Nginx Header Hardening
Add to `/opt/mailcow-dockerized/data/conf/nginx/includes/site-defaults.conf`:
```nginx
# Strong SSL ciphers only
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;
ssl_conf_command Options PrioritizeChaCha;
# HSTS — include subdomains if all your services use HTTPS
add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload";
# Disable X-XSS-Protection (deprecated, CSP replaces it)
add_header X-XSS-Protection "0";
# Deny unused browser permissions
add_header Permissions-Policy "accelerometer=(), ambient-light-sensor=(), autoplay=(), battery=(), camera=(), geolocation=(), gyroscope=(), magnetometer=(), microphone=(), payment=(), usb=()";
# Content Security Policy — if NOT using Gravatar with SOGo
add_header Content-Security-Policy "default-src 'none'; connect-src 'self' https://api.github.com; font-src 'self' https://fonts.gstatic.com; img-src 'self' data:; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline' https://fonts.googleapis.com; frame-ancestors 'none'; upgrade-insecure-requests; block-all-mixed-content; base-uri 'none'";
# Cross-origin isolation headers
add_header Cross-Origin-Resource-Policy same-origin;
add_header Cross-Origin-Opener-Policy same-origin;
add_header Cross-Origin-Embedder-Policy require-corp;
# Disable gzip to prevent BREACH attack
# Change gzip on; → gzip off; in the main nginx conf
```
> ⚠ The December 2025 MailCow update already removed the deprecated `X-XSS-Protection` header from defaults. If you're current, you may already have this. Check before duplicating.
After editing, restart nginx:
```bash
docker compose restart nginx-mailcow
```
---
## Rspamd Tuning
Rspamd is MailCow's spam filter. The defaults are reasonable but a few adjustments improve both inbound protection and outbound policy enforcement.
### Key Settings to Review
Navigate to **MailCow UI → Configuration → Rspamd UI** (or directly at `https://mail.yourdomain.com/rspamd/`)
**Actions → Score Thresholds:**
| Action | Default | Recommended |
|---|---|---|
| Greylist | 4 | 3 |
| Add header | 6 | 5 |
| Reject | 15 | 12 |
Lowering the reject threshold from 15 to 12 catches more aggressive spam while avoiding false positives.
**Modules to enable/verify:**
| Module | Purpose |
|---|---|
| DKIM verification | Verify incoming DKIM signatures |
| SPF | Verify incoming SPF |
| DMARC | Enforce DMARC on inbound |
| MX Check | Verify sending domain has a valid MX |
| RBL (Realtime Blacklists) | Check sending IPs against blocklists |
| Greylisting | Temporary reject new senders (forces retry) |
### Add CrowdSec as an Rspamd Feed
If you also have the CrowdSec bouncer running on the MailCow host (or can reach it), you can feed CrowdSec decisions into Rspamd to reject mail from banned IPs. This is advanced but powerful — see the [CrowdSec Bouncer for Rspamd](https://hub.crowdsec.net) hub entry.
---
## Deliverability Verification
Run these checks after making any DNS or config changes:
| Tool | What It Checks | URL |
|---|---|---|
| MXToolbox | SPF, DKIM, DMARC, MX, PTR, blacklists | mxtoolbox.com |
| mail-tester.com | Send a test email, get a 110 score | mail-tester.com |
| Port25 verifier | Send to check-auth@verifier.port25.com | Email-based |
| DKIM validator | Validates DKIM signature | dkimvalidator.com |
| Google Postmaster Tools | Gmail reputation monitoring (requires setup) | postmaster.google.com |
| Microsoft SNDS | Outlook/Hotmail reputation | sendersupport.olc.protection.outlook.com |
> ✓ Aim for 910/10 on mail-tester.com. Anything below 8 indicates a misconfiguration that will hurt deliverability.
---
## Keeping MailCow Updated
MailCow has had several critical security vulnerabilities in 2025 (session fixation, SSTI, password reset poisoning). Staying current is non-negotiable.
```bash
cd /opt/mailcow-dockerized
# Pull latest images
docker compose pull
# Apply update
./update.sh
# Or if using the newer helper:
docker compose up -d
```
> ✓ Subscribe to the [MailCow blog](https://mailcow.email/posts/) or watch the [GitHub releases](https://github.com/mailcow/mailcow-dockerized/releases) for security advisories. The update cadence is roughly monthly.
Set up a cron job or Monit check to alert you when MailCow is more than 30 days behind the latest release.
---
## Checklist Summary
| Item | Status |
|---|---|
| SPF includes both own IP and mxroute.com | ☐ |
| Two DKIM selectors (mailcow + mxroute) | ☐ |
| DMARC in monitoring mode, advancing to reject | ☐ |
| DMARC reports being processed (Postmark/dmarcian) | ☐ |
| MTA-STS policy published and enforced | ☐ |
| TLS-RPT record in DNS | ☐ |
| MXRoute relay connection uses TLS/encrypt level | ☐ |
| Admin UI restricted to internal network | ☐ |
| 2FA on admin and all privileged accounts | ☐ |
| Postfix TLS 1.2+ enforced via extra.cf | ☐ |
| Nginx security headers added | ☐ |
| Rate limits set on outbound per-domain | ☐ |
| MailCow updated to latest (monthly check) | ☐ |
| Rspamd thresholds reviewed | ☐ |
| PTR/rDNS record matches mail hostname | ☐ |
---
## Related Documentation
- [OPNsense Firewall](./opnsense-firewall) — dedicated ATT_Mail virtual IP, port NAT
- [CrowdSec](./crowdsec) — IP reputation blocking at firewall level
- [Graylog](./graylog) — DMARC report and TLS-RPT ingestion target
- [Caddy Reverse Proxy](./caddy-reverse-proxy) — if MailCow webmail is proxied through Caddy

View file

@ -0,0 +1,490 @@
---
title: Mailcow Dockerized Install and Config
description:
published: true
date: 2026-02-25T21:05:48.256Z
tags:
editor: markdown
dateCreated: 2026-02-25T21:05:38.864Z
---
# MailCow — Installation & Configuration
**Host:** docker4 (192.168.5.16)
**Hostname:** hermes.netgrimoire.com
**Admin URL:** https://mail.netgrimoire.com
**Version:** 2025-10a (update 2026-01 available as of documentation date)
**Installed:** /opt/mailcow-dockerized
**Timezone:** America/Chicago
**Architecture:** x86_64
**CPU:** 16 cores
**RAM:** 30.63 GB
**Disk:** /dev/nvme0n1p2 — 442G / 502G used (93% — monitor this)
---
## Overview
Mailcow runs as a Docker stack on docker4, attached to the `netgrimoire` overlay network. All containers use `restart: unless-stopped` via a compose override. Outbound mail routes through MXRoute via sender-dependent transports. Inbound mail arrives from MXRoute which acts as the public-facing inbound gateway (solving residential AT&T IP filtering issues with banks).
See [MXRoute Master Configuration](./mxroute-master) for full inbound/outbound/DNS detail per domain.
---
## Installation Paths
| Path | Purpose |
|------|---------|
| `/opt/mailcow-dockerized/` | Mailcow root |
| `/opt/mailcow-dockerized/mailcow.conf` | Primary configuration file |
| `/opt/mailcow-dockerized/docker-compose.yml` | Base compose (do not edit) |
| `/opt/mailcow-dockerized/docker-compose.override.yml` | Local overrides — network and restart policy |
| `/opt/mailcow-dockerized/data/conf/postfix/extra.cf` | Persistent Postfix overrides |
| `/opt/mailcow-dockerized/data/conf/postfix/main.cf` | Postfix base config (managed by Mailcow) |
| `/opt/mailcow-dockerized/data/conf/rspamd/` | Rspamd configuration |
| `/opt/mailcow-dockerized/data/assets/ssl/` | TLS certificates |
---
## mailcow.conf — Key Settings
```ini
MAILCOW_HOSTNAME=hermes.netgrimoire.com
MAILCOW_PASS_SCHEME=BLF-CRYPT
# Database
DBNAME=mailcow
DBUSER=mailcow
DBPASS=mg7Z8W9UsPlOh0S6vF7TmmPb6n1s
DBROOT=JdymsZFFACHkDcOdziQ53QruCTG2
# Redis
REDISPASS=6AduWQsmBYGMKfOi1CNEGQfTE3RH
# Ports — HTTPS runs on 3443, proxied through Caddy
HTTP_PORT=80
HTTP_BIND=
HTTPS_PORT=3443
HTTPS_BIND=
HTTP_REDIRECT=n
# Mail ports (standard)
SMTP_PORT=25
SMTPS_PORT=465
SUBMISSION_PORT=587
IMAP_PORT=143
IMAPS_PORT=993
POP_PORT=110
POPS_PORT=995
SIEVE_PORT=4190
# Internal ports (localhost only)
DOVEADM_PORT=127.0.0.1:19991
SQL_PORT=127.0.0.1:13306
REDIS_PORT=127.0.0.1:7654
# TLS cert coverage
ADDITIONAL_SAN=smtp.*,imap.*
AUTODISCOVER_SAN=y
# ACME / Let's Encrypt
SKIP_LETS_ENCRYPT=n
SKIP_IP_CHECK=y
SKIP_HTTP_VERIFICATION=y
# Services — all enabled
SKIP_CLAMD=n
SKIP_OLEFY=n
SKIP_SOGO=n
SKIP_FTS=n
# FTS (Flatcurve/Xapian)
FTS_HEAP=128
FTS_PROCS=1
# Watchdog
USE_WATCHDOG=y
WATCHDOG_NOTIFY_START=y
WATCHDOG_NOTIFY_BAN=n
WATCHDOG_EXTERNAL_CHECKS=n
# Networking
IPV4_NETWORK=172.22.1
IPV6_NETWORK=fd4d:6169:6c63:6f77::/64
ENABLE_IPV6=false
# Misc
MAILDIR_GC_TIME=7200
MAILDIR_SUB=Maildir
SOGO_EXPIRE_SESSION=480
SOGO_URL_ENCRYPTION_KEY=ojmPfhnM4MYMsA2f
ACL_ANYONE=disallow
ALLOW_ADMIN_EMAIL_LOGIN=n
DOCKER_COMPOSE_VERSION=native
COMPOSE_PROJECT_NAME=mailcow
LOG_LINES=9999
```
---
## docker-compose.override.yml
All services are attached to the external `netgrimoire` overlay network and set to `restart: unless-stopped`. The override does not change any image versions or environment variables — it only adds network membership and restart policy.
```yaml
services:
unbound-mailcow:
networks:
netgrimoire:
restart: unless-stopped
mysql-mailcow:
networks:
- netgrimoire
restart: unless-stopped
redis-mailcow:
networks:
- netgrimoire
restart: unless-stopped
clamd-mailcow:
networks:
- netgrimoire
restart: unless-stopped
rspamd-mailcow:
networks:
- netgrimoire
restart: unless-stopped
php-fpm-mailcow:
networks:
- netgrimoire
restart: unless-stopped
sogo-mailcow:
networks:
- netgrimoire
restart: unless-stopped
dovecot-mailcow:
networks:
- netgrimoire
restart: unless-stopped
postfix-mailcow:
networks:
- netgrimoire
restart: unless-stopped
postfix-tlspol-mailcow:
networks:
- netgrimoire
restart: unless-stopped
memcached-mailcow:
restart: unless-stopped
nginx-mailcow:
networks:
- netgrimoire
restart: unless-stopped
acme-mailcow:
networks:
- netgrimoire
restart: unless-stopped
watchdog-mailcow:
networks:
- netgrimoire
restart: unless-stopped
dockerapi-mailcow:
networks:
- netgrimoire
restart: unless-stopped
olefy-mailcow:
networks:
- netgrimoire
restart: unless-stopped
ofelia-mailcow:
networks:
- netgrimoire
restart: unless-stopped
networks:
netgrimoire:
external: true
driver: overlay
```
---
## Container Image Versions
From `docker-compose.yml` (base file — version 2025-10a):
| Service | Image |
|---------|-------|
| unbound-mailcow | ghcr.io/mailcow/unbound:1.24 |
| mysql-mailcow | mariadb:10.11 |
| redis-mailcow | redis:7.4.6-alpine |
| clamd-mailcow | ghcr.io/mailcow/clamd:1.71 |
| rspamd-mailcow | ghcr.io/mailcow/rspamd:2.4 |
| php-fpm-mailcow | ghcr.io/mailcow/phpfpm:1.94 |
| sogo-mailcow | ghcr.io/mailcow/sogo:1.136 |
| dovecot-mailcow | ghcr.io/mailcow/dovecot:2.35 |
| postfix-mailcow | ghcr.io/mailcow/postfix:1.81 |
| postfix-tlspol-mailcow | ghcr.io/mailcow/postfix-tlspol:1.0 |
| memcached-mailcow | memcached:alpine |
| nginx-mailcow | ghcr.io/mailcow/nginx:1.05 |
| acme-mailcow | ghcr.io/mailcow/acme:1.94 |
| netfilter-mailcow | ghcr.io/mailcow/netfilter:1.63 |
| watchdog-mailcow | ghcr.io/mailcow/watchdog:2.09 |
| dockerapi-mailcow | ghcr.io/mailcow/dockerapi:2.11 |
| olefy-mailcow | ghcr.io/mailcow/olefy:1.15 |
| ofelia-mailcow | mcuadros/ofelia:latest |
---
## Postfix Configuration
### extra.cf
```
myhostname = hermes.netgrimoire.com
```
> The MXRoute trusted network entries should also be here. Current extra.cf only contains myhostname — confirm mynetworks is set correctly or add the MXRoute IP ranges if not already present via the UI.
### Key Postfix Settings (from running config)
```
mynetworks = 127.0.0.0/8 172.22.1.0/24 10.0.1.0/24 [::1]/128 [fd4d:6169:6c63:6f77::]/64 [fe80::]/64
message_size_limit = 104857600 # 100MB
mailbox_size_limit = 0 # unlimited
bounce_queue_lifetime = 1d
maximal_queue_lifetime = 5d
delay_warning_time = 4h
postscreen_dnsbl_threshold = 6
postscreen_dnsbl_action = enforce
postscreen_greet_action = enforce
smtpd_relay_restrictions = permit_mynetworks, permit_sasl_authenticated, defer_unauth_destination
disable_vrfy_command = yes
broken_sasl_auth_clients = yes
```
---
## Domains
10 domains configured. All active.
| Domain | Mailboxes | Sender-Dependent Transport | Created |
|--------|-----------|---------------------------|---------|
| bamalady.com | 0 / 10 | *(not confirmed)* | — |
| bill740.com | 1 / 10 | *(not confirmed)* | — |
| florosafd.org | 4 / 10 | ID 4: heracles.mxrouting.net:587 (relay@florosafd.org) | 2025-11-21 |
| gnarlypandaproductions.com | 2 / 10 | ID 5: heracles.mxrouting.net:587 (relay@gnarlypandaproductions.com) | 2025-11-21 |
| netgrimoire.com | 2 / 10 | ID 2: heracles.mxrouting.net:587 (relay@netgrimoire.com) | 2025-11-21 |
| nucking-futz.net | 0 / 10 | *(not confirmed)* | — |
| pncfishandmore.com | 4 / 10 | ID 6: heracles.mxrouting.net:587 (relay@pncfishandmore.com) | — |
| pncharris.com | 4 / 10 | ID 3: heracles.mxrouting.net:587 (passer@pncharris.com) | 2025-11-21 |
| pncharrisenterprises.com | 2 / 10 | *(not confirmed from screenshots)* | — |
| wasted-bandwidth.net | 1 / 10 | ID 1: heracles.mxrouting.net:587 (relay@wasted-bandwidth.net) | — |
> MXRoute relay hostname is `heracles.mxrouting.net:587` — note this differs from the generic `smtp.mxroute.com` placeholder used in setup docs. Always use `heracles.mxrouting.net:587` for this account.
---
## Mailboxes
19 active mailboxes across all domains:
| Mailbox | Messages | Domain |
|---------|----------|--------|
| bill@bill740.com | 1 | bill740.com |
| chieflee@florosafd.org | 2124 | florosafd.org |
| cindy@pncfishandmore.com | 1109 | pncfishandmore.com |
| cindy@pncharris.com | 33797 | pncharris.com |
| cindy@pncharrisenterprises.com | 819 | pncharrisenterprises.com |
| dads_attic@pncharris.com | 0 | pncharris.com |
| jim.harris@florosafd.org | 8 | florosafd.org |
| kyle@gnarlypandaproductions.com | 486 | gnarlypandaproductions.com |
| kyle@pncfishandmore.com | 110 | pncfishandmore.com |
| kyle@pncharris.com | 31182 | pncharris.com |
| phil@florosafd.org | 5 | florosafd.org |
| phil@gnarlypandaproductions.com | 5 | gnarlypandaproductions.com |
| phil@netgrimoire.com | 1 | netgrimoire.com |
| phil@pncfishandmore.com | 10 | pncfishandmore.com |
| phil@pncharris.com | 3210 | pncharris.com |
| phil@pncharrisenterprises.com | 1 | pncharrisenterprises.com |
| times@florosafd.org | 191 | florosafd.org |
| traveler@netgrimoire.com | 3 | netgrimoire.com |
| traveler@wasted-bandwidth.net | 138 | wasted-bandwidth.net |
---
## Aliases
| ID | Alias | Target Domain | Internal |
|----|-------|---------------|---------|
| 7 | cindy@bamalady.com | bamalady.com | No |
---
## Sender-Dependent Transports
All outbound relay routes through `heracles.mxrouting.net:587`. This is your MXRoute server hostname — use this exact value when adding new transports.
| ID | Host | Username | Password |
|----|------|----------|----------|
| 1 | heracles.mxrouting.net:587 | relay@wasted-bandwidth.net | dZ4yLYznVvgSJtqWZJFA |
| 2 | heracles.mxrouting.net:587 | relay@netgrimoire.com | TVGCnJp9SxRbWU8EhkMw |
| 3 | heracles.mxrouting.net:587 | passer@pncharris.com | bBJtPhrGkHvvhxhukkae |
| 4 | heracles.mxrouting.net:587 | relay@florosafd.org | 2Fe8XMyaeh6Z5dvdHYdq |
| 5 | heracles.mxrouting.net:587 | relay@gnarlypandaproductions.com | vG5ZsUQhRWD2UyzLPsqA |
| 6 | heracles.mxrouting.net:587 | relay@pncfishandmore.com | *(confirm from MXRoute panel)* |
---
## DKIM Keys
Two DKIM selectors are configured per domain — one for Mailcow (selector: `dkim`) and one added separately for MXRoute outbound signing. The Mailcow-managed keys use selector `dkim._domainkey`.
### pncharris.com
```
v=DKIM1;k=rsa;t=s;s=email;p=MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAqhgQV7r+KKQwJceWenZ3FNq8AsllgW6cIm/0jpsLT62vF1yy0nh2MdhjYgQAX2MK9HHYzNZcCB3+OPpqBbXeNbSDckxB/dC+z/vboMHrJmYonfaSYshZjSR80V/a2Yoq+hiXQ9eBcuOggENtMm4XvEsl/vOWLBMfasqe+X11gzQBeRv1tTaXJB0C4i7tAcfi0O/AxH8QFTr2099+k2iepn8J15ukk1zu4zemBJj4Z3uFTNnBP8YpgKbYoUDyMVIKIxGjANVBBypcrMKavpQ4F1JLhgGFhWAsAuFRwZsnOaftZyMuzAZxM37DTd/bF2WanmK3Xe75SN5uOnEXjuzW/wIDAQAB
```
### netgrimoire.com
```
v=DKIM1;k=rsa;t=s;s=email;p=MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAoJ9YKqV9+6gOcVKI+UJ0TRcMmergxU8HLO+mwTMfqOhblsEcDPO60c8ya24iIXg51AA2k5Xcbb0bLScaaIi0P/TRzP/bonAZkPS1Y8Fx1se9dikTsA9Lazho u6DvoFkkV/IPH1ZNg68Cd9teAD5tvoY18OSneJJsocXwFo57c+XccUaTxjpV7eReuT4da7iNHMmUmZNfKenxVMKD740zrDJAeAsXtEb/71CochHYSm+qAvuG9/WPixJbMsJLF/iVhV3Byp0LCrB+CwGTwnsiUcd7QpuD6rRs/7zzdGBtoN22m/j390GimFstYvB61I20h8sHWGAG66dLko6Sgvs47wIDAQAB
```
### gnarlypandaproductions.com
```
v=DKIM1;k=rsa;t=s;s=email;p=MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA...
```
*(scroll cut off in screenshot — retrieve full key from Mailcow UI → Edit domain → bottom of page)*
> All other domain DKIM keys should be retrieved from the Mailcow domain edit page and recorded here for disaster recovery completeness.
---
## Network Configuration
Mailcow containers join the `netgrimoire` external overlay network, allowing communication with other Docker Swarm services (Caddy reverse proxy, etc.) without exposing ports directly to the host network.
**Internal Docker network:** `172.22.1.0/24`
Key container IPs within the mailcow-network:
- unbound: 172.22.1.254
- redis: 172.22.1.249
- sogo: 172.22.1.248
- dovecot: 172.22.1.250
- postfix: 172.22.1.253
**IPv6:** disabled (`ENABLE_IPV6=false`)
---
## Caddy Reverse Proxy
Mailcow's nginx listens on HTTPS port 3443 internally. Caddy proxies external requests to it. Mailcow handles its own TLS for direct mail client connections (IMAP 993, SMTP 465/587).
The admin UI at `mail.netgrimoire.com` is proxied through Caddy on the `netgrimoire` overlay network.
---
## Updating Mailcow
```bash
cd /opt/mailcow-dockerized
# Pull latest
git fetch origin
git checkout origin/master
# Update containers
docker compose pull
./update.sh
```
> As of documentation date, version 2026-01 is available. Current running version is 2025-10a. Update when convenient — check the [MailCow changelog](https://github.com/mailcow/mailcow-dockerized/releases) for breaking changes first.
Monthly update check is recommended. MailCow had multiple security vulnerabilities in 2025 — staying current is important.
---
## Common Operations
### Restart all containers
```bash
cd /opt/mailcow-dockerized
docker compose restart
```
### Restart single container (e.g. after extra.cf change)
```bash
docker compose restart postfix-mailcow
```
### View logs
```bash
# Postfix
docker compose logs postfix-mailcow -f
# Dovecot
docker compose logs dovecot-mailcow -f
# All containers
docker compose logs -f
```
### Check queue
```bash
docker exec mailcow-postfix-mailcow-1 postqueue -p
```
### Flush queue
```bash
docker exec mailcow-postfix-mailcow-1 postqueue -f
```
### Check container health
```bash
docker compose ps
```
---
## Known Gotchas
**Disk usage is at 93%.** The nvme0n1p2 volume has 442G used of 502G. This needs attention — vmail storage grows over time and garbage collection runs hourly but only removes items older than 7200 minutes (5 days). Monitor this and consider quota enforcement per mailbox if growth continues.
**extra.cf is minimal.** The MXRoute trusted network IPs should be confirmed in the running Postfix config. The `mynetworks` value from `postconf` shows `10.0.1.0/24` is already trusted — confirm whether MXRoute IP ranges `69.167.160.0/19` and `198.54.120.0/22` are included. If not, add them to extra.cf and restart postfix.
**MXRoute relay hostname.** The actual relay hostname for this account is `heracles.mxrouting.net:587` — not the generic `smtp.mxroute.com` placeholder. All 6 transports use `heracles.mxrouting.net:587`. Use this exact hostname for any new transport entries.
**pncharris.com uses passer@ not relay@.** Transport ID 3 for pncharris.com authenticates as `passer@pncharris.com`, not `relay@pncharris.com`. This is intentional — the relay@ account exists but passer@ is the current active relay credential.
**HTTPS on port 3443.** Mailcow's web UI is not on the standard 443 — it binds to 3443 and Caddy handles the public-facing 443 proxy. Direct access to the UI requires going through Caddy or using the internal port.
**nucking-futz.net vs nucking-futz.com.** The domains list shows `nucking-futz.net` but the intended new domain is `nucking-futz.com`. Verify which is actually configured and correct if needed.
**bamalady.com and bill740.com** have no transport assigned in the screenshots. Confirm whether these domains need MXRoute relay configured.
---
## Related Documentation
- [MXRoute Master Configuration](./mxroute-master) — per-domain DNS, inbound forwarding, outbound relay credentials
- [Mail Setup — nucking-futz.com](./mail-setup-nucking-futz) — new domain setup guide
- [MailCow Security Hardening](./mailcow-security-hardening)
- [Caddy Reverse Proxy](./caddy-reverse-proxy) — proxies mail.netgrimoire.com to port 3443
- [OPNsense Firewall](./opnsense-firewall) — ATT_Mail static IP, port forwarding rules

View file

@ -0,0 +1,430 @@
---
title: Integrating MXRoute with MailCow
description:
published: true
date: 2026-02-25T21:04:37.135Z
tags:
editor: markdown
dateCreated: 2026-02-25T19:22:31.514Z
---
# MXRoute — Master Configuration Reference
## Overview
MXRoute serves two roles in Netgrimoire mail infrastructure:
- **Inbound gateway** — MX records for all domains point to MXRoute's commercial IPs, solving residential AT&T IP filtering by banks and financial institutions. MXRoute receives mail and forwards to Mailcow via per-address forwarders.
- **Outbound relay** — Mailcow sends all outbound mail through MXRoute via sender-dependent transports for improved deliverability.
**Mail flow:**
```
Inbound: Internet → MXRoute (commercial IP) → Mailcow (192.168.5.16)
Outbound: Mailcow (192.168.5.16) → MXRoute SMTP relay → Internet
```
**Mailcow host:** 192.168.5.16
**MXRoute control panel:** confirm server hostname from MXRoute welcome email (e.g. `arrow.mxrouting.net`)
**MXRoute SMTP relay:** confirm from welcome email (e.g. `smtp.mxroute.com:587`)
---
## Architecture — Why Two Domains Per Hosted Domain
MXRoute forwarders require a valid destination email address. Forwarding `user@domain.com` back to `user@domain.com` creates a mail loop because MXRoute would look up the MX for `domain.com` and find itself. The solution is a `mail.domain.com` subdomain with its own MX record pointing directly to Mailcow. MXRoute forwards to `user@mail.domain.com`, Mailcow accepts and delivers, and an alias domain maps `@domain.com` back so users only ever see `@domain.com`.
```
domain.com MX → MXRoute (public-facing, receives from internet)
mail.domain.com MX → 192.168.5.16 (internal, MXRoute forwards here)
```
---
## MXRoute Control Panel
**Login:** confirm URL from MXRoute welcome email
**Interface:** MXRoute 4.0 (new UI — not old DirectAdmin)
### Creating a Forwarder
1. Go to **Forwarders**
2. Click **Create New Forwarder**
3. Set **Forwarder Name:** `username` (domain shown automatically)
4. Set **Destination Type:** `Forward to Email(s)`
5. Set **Recipients:** `username@mail.domain.com`
6. Click **Create Forwarder**
> Recipients field accepts multiple addresses comma or newline separated.
---
## Mailcow Configuration
### Adding a New Domain (One-Time Per Domain)
1. **Mail Setup → Domains → Add domain**
- Domain: `mail.domain.com` (the subdomain Mailcow owns)
- Leave relay settings as default
2. **Mail Setup → Alias Domains → Add alias domain**
- Alias Domain: `domain.com`
- Target Domain: `mail.domain.com`
- This makes Mailcow accept and deliver mail for `@domain.com` to `@mail.domain.com` mailboxes
3. **Configuration → ARC/DKIM Keys**
- Select domain `mail.domain.com`
- Selector: `mailcow`
- Key length: 2048
- Generate and copy TXT record for DNS
4. **Configuration → Extra Postfix configuration → extra.cf**
```
# Trust MXRoute forwarding IPs — prevents SPF scoring on forwarded mail
mynetworks = 127.0.0.1/8 [::1]/128 192.168.5.0/24 69.167.160.0/19 198.54.120.0/22
```
Restart affected containers after saving.
### Adding a New Mailbox
1. **Mail Setup → Mailboxes → Add mailbox**
- Username: `user`
- Domain: `mail.domain.com`
2. **MXRoute control panel → Forwarders → Create New Forwarder**
- Forwarder: `user@domain.com`
- Destination: `user@mail.domain.com`
### Outbound Relay — Sender-Dependent Transports
One transport entry per domain. **Configuration → Routing → Sender-Dependent Transports**
| Domain | Relay Host | Username | Password |
|--------|-----------|----------|----------|
| pncharris.com | `[smtp.mxroute.com]:587` | relay@pncharris.com | H@rv3yD)G123 |
| wasted-bandwidth.net | `[smtp.mxroute.com]:587` | relay@wasted-bandwidth.net | dZ4yLYznVvgSJtqWZJFA |
| netgrimoire.com | `[smtp.mxroute.com]:587` | relay@netgrimoire.com | TVGCnJp9SxRbWU8EhkMw |
| florosafd.org | `[smtp.mxroute.com]:587` | relay@florosafd.org | 2Fe8XMyaeh6Z5dvdHYdq |
| gnarlypandaproductions.com | `[smtp.mxroute.com]:587` | relay@gnarlypandaproductions.com | vG5ZsUQhRWD2UyzLPsqA |
> Confirm SMTP relay hostname from MXRoute welcome email — substitute actual hostname for `smtp.mxroute.com` if different.
### Email Client Settings (All Domains)
| Setting | Value |
|---------|-------|
| IMAP server | `mail.domain.com` |
| IMAP port | `993` (SSL/TLS) |
| SMTP server | `mail.domain.com` |
| SMTP port | `465` (SSL/TLS) |
| Username | `user@domain.com` |
> Users log in with `@domain.com`. Mailcow resolves to the internal `@mail.domain.com` mailbox via alias domain — transparent to the user.
---
## DNS Reference — All Domains
### DNS Pattern (Apply to Every Domain)
Two sets of MX records are required — one for the public domain (pointing to MXRoute) and one for the mail subdomain (pointing directly to Mailcow).
| Type | Host | Value | Notes |
|------|------|-------|-------|
| A | `mail` | `YOUR_ATT_MAIL_IP` | Mailcow server — MXRoute forwards here |
| MX | `@` | MXRoute primary (priority 10) | From MXRoute welcome email |
| MX | `@` | MXRoute secondary (priority 20) | From MXRoute welcome email |
| MX | `mail` | `mail.domain.com` (priority 10) | Mailcow handles subdomain directly |
| CNAME | `imap` | `mail.domain.com` | Client autoconfiguration |
| CNAME | `smtp` | `mail.domain.com` | Client autoconfiguration |
| CNAME | `webmail` | `mail.domain.com` | Roundcube access |
| CNAME | `autodiscover` | `mail.domain.com` | Outlook autodiscover |
| CNAME | `autoconfig` | `mail.domain.com` | Thunderbird autoconfig |
| TXT | `@` | `v=spf1 ip4:YOUR_ATT_MAIL_IP include:mxroute.com -all` | SPF — both Mailcow direct and MXRoute relay |
| TXT | `mail` | `v=spf1 ip4:YOUR_ATT_MAIL_IP -all` | SPF for subdomain — Mailcow direct only |
| TXT | `_dmarc` | `v=DMARC1; p=reject; rua=mailto:admin@netgrimoire.com` | DMARC enforcement |
| TXT | `mailcow._domainkey.mail` | *(generated in Mailcow ARC/DKIM Keys)* | Mailcow DKIM selector |
| TXT | `x._domainkey` | *(from MXRoute control panel)* | MXRoute DKIM selector — confirm actual selector name |
---
### pncharris.com
| Type | Host | Value |
|------|------|-------|
| A | `mail` | YOUR_ATT_MAIL_IP |
| MX | `@` | MXRoute primary (priority 10) |
| MX | `@` | MXRoute secondary (priority 20) |
| MX | `mail` | `mail.pncharris.com` (priority 10) |
| CNAME | `imap` | `mail.pncharris.com` |
| CNAME | `smtp` | `mail.pncharris.com` |
| CNAME | `webmail` | `mail.pncharris.com` |
| CNAME | `autodiscover` | `mail.pncharris.com` |
| CNAME | `autoconfig` | `mail.pncharris.com` |
| TXT | `@` | `v=spf1 ip4:YOUR_ATT_MAIL_IP include:mxroute.com -all` |
| TXT | `mail` | `v=spf1 ip4:YOUR_ATT_MAIL_IP -all` |
| TXT | `_dmarc` | `v=DMARC1; p=reject; rua=mailto:admin@netgrimoire.com` |
| TXT | `mailcow._domainkey.mail` | *(from Mailcow ARC/DKIM Keys for mail.pncharris.com)* |
| TXT | `x._domainkey` | *(from MXRoute control panel)* |
**Mailcow domains:** `mail.pncharris.com` (primary), `pncharris.com` (alias domain → mail.pncharris.com)
**Relay credentials:**
| Account | Password | Notes |
|---------|----------|-------|
| relay@pncharris.com | H@rv3yD)G123 | Current relay account |
| forwarder@pncharris.com | *(see password history below)* | Legacy account |
| passer@pncharris.com | bBJtPhrGkHvvhxhukkae | Current |
| kylr pncharris | -,68,incTeR | |
| G4@rlyf1ng3r | *(Feb 14)* | |
**passer@pncharris.com password history** (most recent last):
- !5!,_\*zDyLEhhR4
- sh7dXWnTPqbkDGsTcwtn
- MY3V8p69b2HYksygxhXX
- RS6U2GU6rcYe3THKKgYx
- yzqNysrd73yzWptVEZ5H (current)
---
### wasted-bandwidth.net
| Type | Host | Value |
|------|------|-------|
| A | `mail` | YOUR_ATT_MAIL_IP |
| MX | `@` | MXRoute primary (priority 10) |
| MX | `@` | MXRoute secondary (priority 20) |
| MX | `mail` | `mail.wasted-bandwidth.net` (priority 10) |
| CNAME | `imap` | `mail.wasted-bandwidth.net` |
| CNAME | `smtp` | `mail.wasted-bandwidth.net` |
| CNAME | `webmail` | `mail.wasted-bandwidth.net` |
| CNAME | `autodiscover` | `mail.wasted-bandwidth.net` |
| CNAME | `autoconfig` | `mail.wasted-bandwidth.net` |
| TXT | `@` | `v=spf1 ip4:YOUR_ATT_MAIL_IP include:mxroute.com -all` |
| TXT | `mail` | `v=spf1 ip4:YOUR_ATT_MAIL_IP -all` |
| TXT | `_dmarc` | `v=DMARC1; p=reject; rua=mailto:admin@netgrimoire.com` |
| TXT | `mailcow._domainkey.mail` | *(from Mailcow ARC/DKIM Keys for mail.wasted-bandwidth.net)* |
| TXT | `x._domainkey` | *(from MXRoute control panel)* |
**Mailcow domains:** `mail.wasted-bandwidth.net` (primary), `wasted-bandwidth.net` (alias domain)
**Relay credentials:**
| Account | Password |
|---------|----------|
| relay@wasted-bandwidth.net | dZ4yLYznVvgSJtqWZJFA |
---
### netgrimoire.com
| Type | Host | Value |
|------|------|-------|
| A | `mail` | YOUR_ATT_MAIL_IP |
| MX | `@` | MXRoute primary (priority 10) |
| MX | `@` | MXRoute secondary (priority 20) |
| MX | `mail` | `mail.netgrimoire.com` (priority 10) |
| CNAME | `imap` | `mail.netgrimoire.com` |
| CNAME | `smtp` | `mail.netgrimoire.com` |
| CNAME | `webmail` | `mail.netgrimoire.com` |
| CNAME | `autodiscover` | `mail.netgrimoire.com` |
| CNAME | `autoconfig` | `mail.netgrimoire.com` |
| TXT | `@` | `v=spf1 ip4:YOUR_ATT_MAIL_IP include:mxroute.com -all` |
| TXT | `mail` | `v=spf1 ip4:YOUR_ATT_MAIL_IP -all` |
| TXT | `_dmarc` | `v=DMARC1; p=reject; rua=mailto:admin@netgrimoire.com` |
| TXT | `mailcow._domainkey.mail` | *(from Mailcow ARC/DKIM Keys for mail.netgrimoire.com)* |
| TXT | `x._domainkey` | *(from MXRoute control panel)* |
**Mailcow domains:** `mail.netgrimoire.com` (primary), `netgrimoire.com` (alias domain)
**Relay credentials:**
| Account | Password |
|---------|----------|
| relay@netgrimoire.com | TVGCnJp9SxRbWU8EhkMw |
---
### florosafd.org
| Type | Host | Value |
|------|------|-------|
| A | `mail` | YOUR_ATT_MAIL_IP |
| MX | `@` | MXRoute primary (priority 10) |
| MX | `@` | MXRoute secondary (priority 20) |
| MX | `mail` | `mail.florosafd.org` (priority 10) |
| CNAME | `imap` | `mail.florosafd.org` |
| CNAME | `smtp` | `mail.florosafd.org` |
| CNAME | `webmail` | `mail.florosafd.org` |
| CNAME | `autodiscover` | `mail.florosafd.org` |
| CNAME | `autoconfig` | `mail.florosafd.org` |
| TXT | `@` | `v=spf1 ip4:YOUR_ATT_MAIL_IP include:mxroute.com -all` |
| TXT | `mail` | `v=spf1 ip4:YOUR_ATT_MAIL_IP -all` |
| TXT | `_dmarc` | `v=DMARC1; p=reject; rua=mailto:admin@netgrimoire.com` |
| TXT | `mailcow._domainkey.mail` | *(from Mailcow ARC/DKIM Keys for mail.florosafd.org)* |
| TXT | `x._domainkey` | *(from MXRoute control panel)* |
**Mailcow domains:** `mail.florosafd.org` (primary), `florosafd.org` (alias domain)
**Relay credentials:**
| Account | Password |
|---------|----------|
| relay@florosafd.org | 2Fe8XMyaeh6Z5dvdHYdq |
---
### gnarlypandaproductions.com
| Type | Host | Value |
|------|------|-------|
| A | `mail` | YOUR_ATT_MAIL_IP |
| MX | `@` | MXRoute primary (priority 10) |
| MX | `@` | MXRoute secondary (priority 20) |
| MX | `mail` | `mail.gnarlypandaproductions.com` (priority 10) |
| CNAME | `imap` | `mail.gnarlypandaproductions.com` |
| CNAME | `smtp` | `mail.gnarlypandaproductions.com` |
| CNAME | `webmail` | `mail.gnarlypandaproductions.com` |
| CNAME | `roundcube` | `roundcube.netgrimoire.com` |
| CNAME | `autodiscover` | `mail.gnarlypandaproductions.com` |
| CNAME | `autoconfig` | `mail.gnarlypandaproductions.com` |
| TXT | `@` | `v=spf1 ip4:YOUR_ATT_MAIL_IP include:mxroute.com -all` |
| TXT | `mail` | `v=spf1 ip4:YOUR_ATT_MAIL_IP -all` |
| TXT | `_dmarc` | `v=DMARC1; p=reject; rua=mailto:admin@gnarlypandaproductions.com` |
| TXT | `mailcow._domainkey.mail` | *(from Mailcow ARC/DKIM Keys for mail.gnarlypandaproductions.com)* |
| TXT | `default._domainkey` | `v=DKIM1; t=s; p=MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA3D3vyPoBHB4eMSMq8HygVWHzYbketRX4yjk9wV4bdaar0/c89dK230FMOW6zVXEsY1sXKFk1kBxerHVw0wY8qnQyooHgINEQcEXrtB/x93Sl/cqBQXk+PHOIOymQwgni8WCUhCSnvunxXK8qX5f9J56qzd0/wpY2WSEHho+XrnQjc+c7HMvkcC3+nKJe59ZNgvQW/Y9B/L6zFDjAp+QOUYp9wwX4L+j1T4fQSygYxAJZ0aIoR8FsbOuXc38pht99HyUnYwH08HoK7xv3DL2BrVo3KVZ7xMe2S4YMxd1HkJz2evbV/ziNsJcKW/le3fFS7mza09yJXDLDcLOKLXbYUQIDAQAB` |
| TXT | `x._domainkey` | *(from MXRoute control panel — confirm actual selector)* |
**Mailcow domains:** `mail.gnarlypandaproductions.com` (primary), `gnarlypandaproductions.com` (alias domain)
**Relay credentials:**
| Account | Password |
|---------|----------|
| relay@gnarlypandaproductions.com | vG5ZsUQhRWD2UyzLPsqA |
---
### nucking-futz.com
New domain — see [Mail Setup — nucking-futz.com](./mail-setup-nucking-futz) for full setup guide.
| Type | Host | Value |
|------|------|-------|
| A | `mail` | YOUR_ATT_MAIL_IP |
| MX | `@` | MXRoute primary (priority 10) |
| MX | `@` | MXRoute secondary (priority 20) |
| MX | `mail` | `mail.nucking-futz.com` (priority 10) |
| CNAME | `imap` | `mail.nucking-futz.com` |
| CNAME | `smtp` | `mail.nucking-futz.com` |
| CNAME | `webmail` | `mail.nucking-futz.com` |
| CNAME | `autodiscover` | `mail.nucking-futz.com` |
| CNAME | `autoconfig` | `mail.nucking-futz.com` |
| TXT | `@` | `v=spf1 ip4:YOUR_ATT_MAIL_IP include:mxroute.com -all` |
| TXT | `mail` | `v=spf1 ip4:YOUR_ATT_MAIL_IP -all` |
| TXT | `_dmarc` | `v=DMARC1; p=reject; rua=mailto:admin@netgrimoire.com` |
| TXT | `mailcow._domainkey.mail` | *(from Mailcow ARC/DKIM Keys for mail.nucking-futz.com)* |
| TXT | `x._domainkey` | *(from MXRoute control panel)* |
**Mailcow domains:** `mail.nucking-futz.com` (primary), `nucking-futz.com` (alias domain)
**Relay credentials:**
| Account | Password |
|---------|----------|
| relay@nucking-futz.com | *(set during MXRoute domain creation)* |
---
## Adding a New Domain — Checklist
Use this checklist every time a new domain is added to the stack.
**DNS (at registrar):**
- [ ] A record: `mail.newdomain.com` → YOUR_ATT_MAIL_IP
- [ ] MX records: `@` → MXRoute servers
- [ ] MX record: `mail``mail.newdomain.com`
- [ ] CNAME records: imap, smtp, webmail, autodiscover, autoconfig
- [ ] SPF TXT: `@` — includes both ATT IP and `include:mxroute.com`
- [ ] SPF TXT: `mail` — ATT IP only
- [ ] DMARC TXT: `_dmarc`
- [ ] DKIM TXT: `mailcow._domainkey.mail` — after generating in Mailcow
- [ ] DKIM TXT: `x._domainkey` — after retrieving from MXRoute
**Mailcow:**
- [ ] Add domain: `mail.newdomain.com`
- [ ] Add alias domain: `newdomain.com``mail.newdomain.com`
- [ ] Generate DKIM key (selector: `mailcow`) for `mail.newdomain.com`
- [ ] Add sender-dependent transport for `newdomain.com`
- [ ] Add sender-dependent transport for `mail.newdomain.com`
- [ ] Create mailboxes as `user@mail.newdomain.com`
**MXRoute:**
- [ ] Add domain in control panel
- [ ] Create forwarder for each mailbox: `user@newdomain.com``user@mail.newdomain.com`
- [ ] Retrieve DKIM key for DNS
---
## Troubleshooting
### Mail not delivering inbound (not reaching Mailcow)
- Check MX records for `@` point to MXRoute servers: `dig MX domain.com +short`
- Check MX record for `mail` subdomain points to Mailcow: `dig MX mail.domain.com +short`
- Verify MXRoute forwarder exists for the address in the control panel
- Check Mailcow logs: **Logs → Postfix** — look for the delivery attempt and any rejection reason
- Verify MXRoute IP ranges are in Mailcow `extra.cf` trusted networks
### Mail not delivering inbound (banks / financial institutions)
- This is the residential AT&T IP problem — confirm MX records point to MXRoute, not directly to your IP
- Run `dig MX domain.com +short` — should show MXRoute servers, not your IP
- If MX still points to your ATT IP, update DNS and wait for propagation
### Outbound mail rejected or going to spam
- Verify sender-dependent transport is configured for the domain in Mailcow
- Check relay credentials are current in the transport entry
- Run an SPF check: `dig TXT domain.com +short` — confirm `include:mxroute.com` is present
- Send test to check-auth@verifier.port25.com for full SPF/DKIM/DMARC report
- Run through https://mail-tester.com for a deliverability score
### DKIM verification failing
- Confirm both selectors are published in DNS:
- `dig TXT mailcow._domainkey.mail.domain.com +short`
- `dig TXT x._domainkey.domain.com +short` (substitute actual MXRoute selector)
- Allow up to 48 hours for DNS propagation after adding records
- Verify selector names match exactly what Mailcow and MXRoute are using to sign
### DMARC failures
- SPF and DKIM must both pass and align with the From: domain
- Check DMARC reports sent to `admin@netgrimoire.com` — use [Postmark DMARC](https://dmarc.postmarkapp.com/) or [dmarcian.com](https://dmarcian.com) to parse raw XML reports
- Common cause: outbound mail going through MXRoute but `include:mxroute.com` missing from SPF
### Forwarded mail getting spam-scored
- Confirm MXRoute IP ranges are in Mailcow `extra.cf` mynetworks
- Check that Mailcow trusted networks were saved and containers restarted
- Verify SRS is working: in Roundcube open a forwarded message → More → View Source → `Return-Path` should begin with `SRS0=`
### New mailbox not receiving mail
- Two steps are required — confirm both were done:
1. Mailbox created in Mailcow as `user@mail.domain.com`
2. Forwarder created in MXRoute as `user@domain.com``user@mail.domain.com`
- If the MXRoute forwarder is missing, inbound mail silently goes nowhere
---
## Related Documentation
- [MailCow Configuration](./mailcow)
- [MailCow Security Hardening](./mailcow-security-hardening)
- [Mail Setup — nucking-futz.com](./mail-setup-nucking-futz)
- [OPNsense Firewall](./opnsense-firewall) — ATT_Mail static IP allocation

View file

@ -0,0 +1,85 @@
---
title: MailCow Overview
description: Self-hosted mail stack — architecture, domains, and key decisions
published: true
date: 2026-04-12T00:00:00.000Z
tags: keystone, mail, mailcow
editor: markdown
dateCreated: 2026-04-12T00:00:00.000Z
---
# MailCow Overview
MailCow runs on `docker4` (hermes, 192.168.5.16) via Docker Compose — not Swarm. It manages mail for all 8 domains.
---
## Architecture
| Component | Role |
|-----------|------|
| MailCow stack | Postfix, Dovecot, Rspamd, ClamAV, SOGo, Roundcube, nginx-mailcow |
| MXRoute | Inbound filtering + outbound relay for all domains |
| nginx-mailcow | Only MailCow container connected to `netgrimoire` overlay |
**Critical:** Only `nginx-mailcow` is attached to the `netgrimoire` overlay network. All other MailCow containers stay on the internal `mailcow-network` bridge. Connecting other containers to the overlay causes Redis and PHP-FPM to resolve to wrong IPs, breaking the entire stack.
---
## Domains
`netgrimoire.com` · `pncharris.com` · `wasted-bandwidth.net` · `nucking-futz.com` · `florosafd.org` · `gnarlypandaproductions.com` · `pncfishandmore.com` · `pncharrisenterprises.com`
---
## Mail Flow
**Inbound:** MXRoute filters → forwards to MailCow → Dovecot delivers
**Outbound:** Postfix → MXRoute relay → recipient
**SRS rewriting:** MXRoute rewrites the envelope sender on forwarded mail. All domains using MXRoute inbound forwarding **must** have catch-all aliases configured in MailCow, or `reject_unlisted_sender` will reject the rewritten addresses.
---
## DKIM
Two selectors required:
| Selector | Purpose |
|----------|---------|
| `mailcow` | Direct sends from MailCow |
| `mxroute` | MXRoute relay path |
---
## Key Limits (must match across all three)
Attachment size limits must be set identically in Postfix, Rspamd, and ClamAV. Changing only Postfix is insufficient — Rspamd and ClamAV reject large messages before Postfix processes them.
---
## Roundcube SSL
Internal connections to Dovecot use self-signed certs. In `config.inc.php`:
```php
$config['imap_conn_options'] = ['ssl' => ['verify_peer' => false, 'verify_peer_name' => false]];
```
---
## Related Docs
- [MXRoute Integration](/Keystone-Grimoire/Mail/MXRoute-Integration)
- [Domain Setup](/Keystone-Grimoire/Mail/Domain-Setup)
- [MailCow Hardening](/Keystone-Grimoire/Mail/Hardening)
- [MailCow Backup](/Vault-Grimoire/Backups/MailCow-Backup)
---
## Pending
- [ ] Dedicated ATT_Mail static IP for outbound mail (OPNsense outbound NAT rule)
- [ ] Second DKIM selector (`mxroute`) validation
- [ ] MTA-STS validation (supported since Sep 2025 update)

View file

@ -0,0 +1,60 @@
---
title: Port Assignments
description:
published: true
date: 2026-02-20T04:21:52.996Z
tags:
editor: markdown
dateCreated: 2026-01-27T03:42:58.945Z
---
# Physical Paths
|Device|IP|Room|Home Infra|DLink|TPLink|Closet|Inter Rack|Rack|Ubiquity|
|------|--|----|------|------|-------|------|----|----|--------|
|Dlink |5.2 |Office | |1| | | | |1 |
|ZNAS |5.10 | | |2| | | | | |
|Docker3 | | | |3| | | | | |
|Docker5 | | | |4| | | | | |
|DockerPi1 | | | |5| | | | | |
|DNS |5.7 | | |6| | | | | |
|Docker4 | | | | | | |W:7 |19|4 |
|Docker2 | | Office | | | | |W:5 |17|11|
|Time Machine| | | | | | |W:6 |18|12|
|Deco Satt | |Room 1 |1 | | | | | |15|
|Deco AP | |Office(E)|10-24| | |24|W:9 |21|20|
|TP Link | | | | |1|22|W:10|22|23|
|OpnSense |3.4 | | | | |23|W:11|23|24|
|OPnSense-Cox| | | | | | | | | |
| | | | | | | | | | |
| | |Room 2 |2 | | | | |2 | |
| | |Room 3 |3 | | | | |3 | |
| | |Living(E)|4 | | | | |4 | |
| | |Living(W)|5 | | | | |5 | |
| | |Family |6 | | | | |6 | |
| | |Pantry |7 | | | | |7 | |
| | |Room 4 |8 | | | | |8 | |
| | |Gym |9 | | | | |9 | |
| | |Office(S)|11 | | | | |11| |
| | |Office(W)|12 | | | | |12| |
| | |Office(W)|13 | | | | |13| |
| | |Office(W)|14 | | | | |14| |
| | |Office(W)|15 | | | | |15| |
| | |Office(W)|16 | | | | |16| |
| | |Office(N)|17 | | | | |17| |
| | |Office(N)|18 | | | | |18| |
| | |Office(N)|19 | | | | |19| |
| | |Office(N)|20 | | | | |20| |
Note: For rooms N,E,S,W are compass directions
For InterRack, W - wall, H - Hallway

View file

@ -0,0 +1,49 @@
---
title: Network Topology
description: Netgrimoire network layout — VLANs, subnets, routing
published: true
date: 2026-04-12T00:00:00.000Z
tags: keystone, network
editor: markdown
dateCreated: 2026-04-12T00:00:00.000Z
---
# Network Topology
## Subnets
| Subnet | Purpose |
|--------|---------|
| 192.168.3.0/24 | OPNsense / firewall management |
| 192.168.4.0/24 | ISPConfig / web hosting |
| 192.168.5.0/24 | Primary LAN — all Docker hosts |
| 192.168.8.0/24 | Pocket Grimoire (GL.iNet Beryl AX) |
| 192.168.32.0/24 | WireGuard VPN peers |
## WireGuard Peers
| Peer | IP | Device |
|------|----|--------|
| Obie | 192.168.32.2 | — |
| pncfishandmore | 192.168.32.3 | — |
| GLNet | 192.168.32.4 | GL.iNet router |
| PortaPotty | 192.168.32.5 | Pocket Grimoire laptop |
| GLNet | 192.168.32.6 | Second GL.iNet |
## DNS
Internal DNS runs on Technitium at `192.168.5.7` (`dns.netgrimoire.com`), behind Authentik.
All `*.netgrimoire.com` and `*.wasted-bandwidth.net` internal hostnames resolve via Technitium. Public DNS managed via ISPConfig and domain registrars.
## Docker Overlay Network
All Swarm services share the `netgrimoire` external overlay network (VIP mode). This is the only overlay network in use.
```
Name: netgrimoire
Driver: overlay
Mode: VIP (always — dnsrr is banned)
```
See [Docker Swarm Template](/Keystone-Grimoire/Docker/Swarm-Template) for attachment rules.

View file

@ -0,0 +1,36 @@
---
title: Keystone Grimoire
description: Architecture — the dwarven runesmith's blueprints
published: true
date: 2026-04-12T00:00:00.000Z
tags: keystone, architecture
editor: markdown
dateCreated: 2026-04-12T00:00:00.000Z
---
# Keystone Grimoire
![keystone-badge](/images/keystone-badge.png)
The Keystone Grimoire holds the architectural blueprints of Netgrimoire — how everything is wired together, how traffic flows, why decisions were made. Remove the keystone and the arch falls. This is the arch.
---
## Sections
| Section | Contents |
|---------|----------|
| [Hosts](/Keystone-Grimoire/Hosts/Host-Inventory) | Node inventory, roles, IPs, pinned services, hardware |
| [Network](/Keystone-Grimoire/Network/Topology) | Topology, VLANs, DNS, WireGuard, OpenVPN, port assignments |
| [Docker](/Keystone-Grimoire/Docker/Swarm-Template) | Swarm template standard, overlay network, label rules, volume paths |
| [Mail](/Keystone-Grimoire/Mail/MailCow-Overview) | MailCow, MXRoute, DKIM, SRS, domain setup, hardening |
---
## Key Principles
- **Caddy is the single entry point** for all web traffic. Every public service goes through Caddy. No exceptions.
- **Docker labels drive routing** — services register themselves with Caddy via `deploy.labels`. Static Caddyfile entries only for Compose stacks where label pickup is unreliable.
- **Never mix label and static routing for the same hostname** — caddy-docker-proxy merges them into a broken upstream pool.
- **Always VIP endpoint mode**`endpoint_mode: dnsrr` is banned. It breaks internal DNS resolution.
- **ARM nodes are excluded by default** — all swarm services carry `node.platform.arch != aarch64` and `node.platform.arch != arm` constraints unless explicitly ARM-specific.

View file

@ -0,0 +1,45 @@
---
title: Hardware Inventory
description: Pocket Grimoire hardware — laptop, router, storage, power
published: true
date: 2026-04-12T00:00:00.000Z
tags: pocket, hardware
editor: markdown
dateCreated: 2026-04-12T00:00:00.000Z
---
# Hardware Inventory
## Core Compute
- Laptop (Docker host)
- ZFS pool `pocket-green` at `/srv/greenpg/`
- Docker Engine (not Swarm)
## Networking
- GL.iNet Beryl AX (GL-MT3000)
- LAN: `192.168.8.0/24`
- WireGuard peer: `PortaPotty` (192.168.32.5)
- Short CAT5/6 cable (router ↔ laptop)
## Storage
| Drive | Mount | Encrypted | Contents |
|-------|-------|-----------|---------|
| SSD Vault | ZFS pool | Yes | Git mirrors, wiki backup, Kopia repo, SSH keys, system configs |
| SSD Green | ZFS pool | Yes | Personal media, Stash data, VeraCrypt containers — personal trips only |
## Media Players
- 2x Onn 4K streaming boxes with power
- FireTV Stick with power
## Power
- Anker Prime 200W 6-Port GaN desktop charger
- Short USB-C cables (router)
- Short USB-A to USB-C (laptop power backup)
- 2x short USB-3 cables (SSDs)
- Longer USB-C to USB-C (laptop primary power)
- Longer USB-C to USB-C (phone/tablet)

View file

@ -0,0 +1,863 @@
---
title: Stream Box
description: Configure ONN Media Box
published: true
date: 2026-02-20T04:50:44.701Z
tags:
editor: markdown
dateCreated: 2026-02-20T04:50:34.384Z
---
# Onn 4K Streaming Box Setup Guide
**Complete configuration guide for Onn 4K streaming boxes used with Pocket Grimoire**
---
## Overview
This guide covers the complete setup of your Onn 4K streaming boxes for use with Pocket Grimoire, including:
- Initial device setup
- WiFi configuration (portapotty network)
- Required app installations (Jellyfin, StashApp, Netflix, YouTube TV)
- Connection to Pocket Grimoire services
- Troubleshooting common issues
**Network Configuration:**
- **WiFi SSID:** `portapotty` (GL.iNet Beryl AX travel router)
- **Connection:** All devices connect wirelessly to portapotty
- **Exception:** Raspberry Pi connects to router via CAT5 ethernet
---
## Hardware Information
### Onn 4K Streaming Box Specifications
- **Model:** Onn 4K Streaming Box (Walmart exclusive)
- **OS:** Android TV (Google TV interface)
- **CPU:** Amlogic S905Y4 quad-core
- **RAM:** 2GB
- **Storage:** 8GB internal
- **Video:** 4K HDR, Dolby Vision, Dolby Atmos
- **WiFi:** 802.11ac (WiFi 5) dual-band
- **Bluetooth:** 5.0
- **Ports:** HDMI 2.1, Micro-USB (power)
- **Remote:** Voice remote with Google Assistant
### What's in the Box
- Onn 4K streaming box
- Voice remote with batteries
- USB power adapter
- HDMI cable (short)
- Quick start guide
---
## Initial Setup
### First Power-On
1. **Connect to TV:**
- Plug HDMI cable into Onn box
- Connect other end to hotel TV HDMI port
- Plug Micro-USB power into Onn box
- Connect USB power adapter to wall or Anker Prime
2. **Power On:**
- TV should auto-detect HDMI input
- If not, use TV remote to select correct HDMI input
- Onn box LED will light up (solid white when ready)
- Wait for Google TV home screen
3. **Select Language:**
- Use remote to select language (English)
- Click OK
4. **Accessibility Options:**
- Skip unless needed (click "Skip")
### WiFi Configuration
**Critical: Connect to portapotty network**
1. **WiFi Setup Screen:**
- List of available networks will appear
- Scroll to find `portapotty`
- Select `portapotty`
- Click "Connect"
2. **Enter Password:**
- Enter WiFi password for portapotty network
- Use on-screen keyboard
- Click "Connect"
- Wait for connection (should take 5-10 seconds)
- "Connected" message will appear
3. **Verify Connection:**
- Should show "portapotty" with signal strength
- Should show "Connected" status
**Troubleshooting WiFi:**
- If portapotty doesn't appear: Ensure Beryl AX router is powered on
- If password fails: Double-check portapotty WiFi password
- If connection drops: Move closer to router
- Signal strength: Should be "Excellent" or "Good" in hotel room
### Google Account Setup
**Option A: Sign in with Google Account**
1. Select "Sign in"
2. Use phone to scan QR code or enter code
3. Follow prompts on phone
4. Account will sync to Onn box
**Option B: Set up without Google Account (Limited)**
1. Select "Skip"
2. Click "Skip" again to confirm
3. Some features will be limited (Play Store, purchases)
4. **Recommendation:** Use Option A for full functionality
**For Pocket Grimoire:**
- Need Google account for: Play Store (to install apps)
- StashApp requires sideloading (see separate section)
### Complete Initial Setup
1. **Google Services:**
- Accept terms (or skip)
- Location services: Your choice
- Device name: Name it (e.g., "Onn Box 1", "Onn Box 2")
2. **Voice Match:**
- Set up "Hey Google" voice commands (optional)
- Can skip and set up later
3. **Apps to Install:**
- Google will suggest popular apps
- Skip for now (we'll install specific apps later)
- Click "Next" or "Skip"
4. **Complete:**
- Should arrive at Google TV home screen
- Remote should control interface
- Ready to install apps
---
## App Installations
### 1. Jellyfin for Android TV
**Install from Google Play Store:**
1. **Open Play Store:**
- Press Home button on remote
- Navigate to "Apps" tab at top
- Select "Play Store"
2. **Search for Jellyfin:**
- Click search icon (magnifying glass)
- Type "Jellyfin" using on-screen keyboard
- Select "Jellyfin for Android TV" from results
- **Developer:** Jellyfin
- **Note:** Choose "Jellyfin for Android TV" not regular Jellyfin
3. **Install:**
- Click "Install"
- Wait for download and installation (~30 seconds)
- Click "Open" when complete
4. **Configure Jellyfin:**
- Click "Connect to Server"
- **Method 1 - Manual Entry:**
- Click "Add server manually"
- Host: `pocket-grimoire.local` or `10.0.0.10` (Pi's IP)
- Port: `8096`
- Click "Connect"
- **Method 2 - Auto-Discovery (if available):**
- Wait for Jellyfin to discover Pocket Grimoire
- Select "Pocket Grimoire" from list
- Click "Connect"
5. **Login:**
- Enter username and password
- Or select "Quick Connect" if configured
- Click "Sign In"
6. **Verify:**
- Should see Jellyfin home screen
- Libraries (Movies, TV Shows) should appear
- Test playing a video (should be direct play, no buffering)
**Jellyfin Settings (Optional but Recommended):**
- Settings → Playback
- Video quality: Maximum
- Allow direct play: ON
- Allow direct stream: ON
- Allow video transcoding: OFF (should be disabled on server already)
### 2. StashApp for Android TV
**Installation: Requires Sideloading (GitHub Release)**
StashApp is not available in Play Store, must be installed manually via APK file.
#### Prerequisites
- USB drive (for APK transfer)
- Computer with internet access
- OR Android phone with file transfer capability
#### Method 1: USB Drive Installation (Recommended)
**On Your Computer:**
1. **Download StashApp APK:**
- Open browser: https://github.com/damontecres/StashAppAndroidTV/releases
- Find latest release (e.g., v1.x.x)
- Download file: `stashapp-tv-release-vX.X.X.apk`
- Save to USB drive
2. **Prepare USB Drive:**
- Format as FAT32 or exFAT (if not already)
- Copy APK to root of USB drive
- Safely eject USB drive
**On Onn Box:**
3. **Enable Unknown Sources:**
- Press Home button
- Navigate to Settings (gear icon)
- Select "Device Preferences"
- Select "Security & Restrictions"
- Enable "Unknown Sources"
- Confirm warning (accept risk)
4. **Install File Manager (if needed):**
- Open Play Store
- Search "File Commander" or "X-plore File Manager"
- Install one of these apps
- Open the file manager app
5. **Connect USB Drive:**
- Plug USB drive into Onn box USB port
- **Note:** Onn box only has Micro-USB (power), so you need:
- USB OTG adapter (Micro-USB to USB-A female)
- OR transfer APK via network/Bluetooth
**Alternative: Network Transfer**
Since Onn box doesn't have easy USB access:
1. **Use Send Files to TV App:**
- On Onn box: Install "Send Files to TV" from Play Store
- On phone/computer: Install companion app
- Transfer APK wirelessly
- Open with package installer
2. **Or Use Cloud Storage:**
- Upload APK to Google Drive
- On Onn box: Install Google Drive app
- Download APK from Drive
- Open with package installer
#### Method 2: Direct Download on Onn Box (Easiest)
**On Onn Box:**
1. **Install Downloader App:**
- Open Play Store
- Search "Downloader" (by AFTVnews)
- Install and open
2. **Download StashApp APK:**
- In Downloader app, click URL field
- Enter: `https://github.com/damontecres/StashAppAndroidTV/releases`
- Navigate to latest release
- Click APK download link
- Save APK
3. **Install APK:**
- Downloader will prompt to install after download
- Click "Install"
- Click "Done" when complete
- APK will be installed
**Configure StashApp:**
1. **Open StashApp:**
- Find in Apps list (may be under "See all apps")
- Or search "Stash" in search bar
2. **Connect to Server:**
- Enter server URL: `http://pocket-grimoire.local:9999`
- Or use IP: `http://10.0.0.10:9999`
- Enter API key (if required)
- Click "Connect"
3. **Test Connection:**
- Should load Stash interface
- Browse library
- Test playing a preview
- Verify scene markers work
**StashApp Settings:**
- Video quality: Original (for direct play)
- Hardware acceleration: ON
- Cache previews: ON (if storage available)
### 3. Netflix
**Install from Google Play Store:**
1. **Open Play Store:**
- Press Home button
- Navigate to "Apps"
- Select "Play Store"
2. **Search Netflix:**
- Search bar → type "Netflix"
- Select "Netflix" (official app)
- Click "Install"
- Wait for installation
3. **Open Netflix:**
- Click "Open" after installation
- Or find in Apps list
4. **Sign In:**
- Enter Netflix email and password
- Or scan QR code with phone
- Select profile
5. **Test:**
- Browse content
- Play a video to verify streaming works
- Check video quality (should be HD/4K)
**Netflix Settings:**
- Profile: Select your profile
- Video quality: High (auto)
- Subtitles/audio: Configure as preferred
### 4. YouTube TV
**Install from Google Play Store:**
1. **Open Play Store:**
- Navigate to Play Store
- Search "YouTube TV"
2. **Install:**
- Select "YouTube TV" (official app)
- Click "Install"
- Wait for installation
3. **Sign In:**
- Open YouTube TV
- Sign in with Google account (YouTube TV subscription)
- Or use TV code activation:
- Visit tv.youtube.com/start on computer/phone
- Enter code shown on TV
- Sign in and authorize
4. **Test:**
- Browse live TV channels
- Test DVR recordings
- Verify streaming quality
**YouTube TV Settings:**
- Live guide: Configure preferences
- DVR: Verify recordings accessible
- Picture quality: Auto or 4K (if available)
---
## Network Configuration Details
### portapotty WiFi Network (GL.iNet Beryl AX)
**Network Details:**
- **SSID:** `portapotty`
- **Frequency:** 2.4GHz + 5GHz (dual-band)
- **Security:** WPA2/WPA3
- **DHCP:** Enabled (automatic IP assignment)
- **Subnet:** 192.168.8.0/24 (default GL.iNet)
- **Router IP:** 192.168.8.1 (Beryl AX admin panel)
- **DNS:** Handled by Beryl AX (AdGuard Home)
**Devices on portapotty Network:**
- Raspberry Pi 4: Ethernet (CAT5) → 10.0.0.10 (static, or check DHCP)
- Onn Box 1: WiFi → 192.168.8.x (DHCP assigned)
- Onn Box 2: WiFi → 192.168.8.x (DHCP assigned)
- Laptop: WiFi → 192.168.8.x (DHCP assigned)
- Phone/tablet: WiFi → 192.168.8.x (DHCP assigned)
### Pocket Grimoire Service Addresses
**When connected to portapotty network:**
```
Jellyfin: http://pocket-grimoire.local:8096
or http://10.0.0.10:8096
Stash: http://pocket-grimoire.local:9999
or http://10.0.0.10:9999
Wiki.js: http://pocket-grimoire.local:3000
or http://10.0.0.10:3000
File Browser: http://pocket-grimoire.local:8080
or http://10.0.0.10:8080
Router Admin: http://192.168.8.1
```
**If `.local` names don't resolve:**
- Use IP addresses directly (10.0.0.10)
- Check Beryl AX DNS settings
- Restart Onn box
---
## Configuration Checklist
### Pre-Deployment (At Home)
**Before traveling, complete these tasks:**
- [ ] Both Onn boxes powered on and tested
- [ ] Both connected to test WiFi network
- [ ] Google accounts signed in on both boxes
- [ ] All 4 apps installed on both boxes:
- [ ] Jellyfin for Android TV
- [ ] StashApp for Android TV (sideloaded)
- [ ] Netflix
- [ ] YouTube TV
- [ ] Jellyfin configured and tested (play test video)
- [ ] StashApp configured and tested (browse library)
- [ ] Netflix signed in (test streaming)
- [ ] YouTube TV signed in (test live TV)
- [ ] Both remotes have fresh batteries
- [ ] Both boxes labeled (Box 1, Box 2) or distinguishable
### Hotel Deployment
**Setup sequence at hotel:**
1. **Setup Beryl AX Router:**
- Power on Beryl AX
- Connect to hotel WiFi (via Beryl AX admin or phone app)
- Verify internet connection
- portapotty WiFi should be active
2. **Setup Pocket Grimoire:**
- Power on Raspberry Pi
- Connect via CAT5 to Beryl AX
- Wait 2-3 minutes for boot
- SSH in and unlock ZFS (if needed)
- Verify Docker containers running
3. **Setup Onn Box 1:**
- Connect to TV HDMI port
- Power on
- Wait for boot (30 seconds)
- Should auto-connect to portapotty
- If not: Settings → Network → portapotty → Connect
- Test Jellyfin (should connect automatically)
- Test StashApp (should connect automatically)
4. **Setup Onn Box 2 (if using):**
- Connect to second TV or different HDMI port
- Repeat setup steps above
- Verify connection to portapotty
5. **Verify All Services:**
- Open Jellyfin → Browse library → Play test video
- Open StashApp → Browse library → Test preview
- Open Netflix → Test streaming
- Open YouTube TV → Test live channel
**Total setup time: 10-15 minutes**
---
## Troubleshooting
### WiFi Connection Issues
**Onn box won't connect to portapotty:**
1. **Verify Router is Online:**
- Check Beryl AX power LED (should be solid)
- Check Beryl AX WiFi LED (should be blinking/solid)
- Use phone to verify portapotty network is visible
2. **Forget and Reconnect:**
- Settings → Network & Internet
- Select portapotty
- Click "Forget network"
- Scan again
- Reconnect with password
3. **Check Router Settings:**
- Access Beryl AX admin: http://192.168.8.1
- Verify WiFi is enabled
- Check if DHCP is active
- Verify no MAC filtering enabled
4. **Restart Devices:**
- Power cycle Onn box (unplug, wait 10 seconds, plug back in)
- Restart Beryl AX router
- Try connecting again
**Weak WiFi Signal:**
- Move Beryl AX closer to TV/Onn box
- Reduce obstacles between router and box
- Use 2.4GHz band instead of 5GHz (better range, slower speed)
- Check for interference (hotel WiFi channels)
### Jellyfin Connection Issues
**Can't connect to Jellyfin server:**
1. **Verify Server is Running:**
- SSH into Pocket Grimoire
- Run: `docker ps | grep jellyfin`
- Should show `pocketgrimoire_jellyfin` running
2. **Check Network Connectivity:**
- On Onn box, open browser app
- Navigate to: `http://pocket-grimoire.local:8096`
- Or try IP: `http://10.0.0.10:8096`
- Should load Jellyfin web interface
3. **Reconnect Jellyfin App:**
- Open Jellyfin app
- Settings → Server
- Delete existing server
- Add server manually:
- Host: `pocket-grimoire.local` or `10.0.0.10`
- Port: `8096`
- Connect and login
4. **Check Firewall:**
- SSH into Pi
- Verify port 8096 is open: `sudo netstat -tlnp | grep 8096`
- Should show jellyfin listening
**Jellyfin Playback Issues:**
**Video won't play:**
- Check media is H.264/AAC (see encoding guide)
- Verify network bandwidth (should be strong WiFi)
- Try different video file
- Check Jellyfin logs: `docker logs pocketgrimoire_jellyfin`
**Video buffers/stutters:**
- Check WiFi signal strength (move router closer)
- Verify direct play (check playback info, should NOT say "transcoding")
- If transcoding occurs: Media is not properly encoded
- Check network activity: `ssh user@pocket-grimoire.local` then `iftop`
**Subtitles don't work:**
- Ensure subtitles are SRT format (not PGS/VobSub)
- External .srt files work best
- Embedded SRT in MKV also works
### StashApp Connection Issues
**Can't connect to Stash server:**
1. **Verify Stash is Running:**
- SSH into Pocket Grimoire
- Run: `docker ps | grep stash`
- Should show `pocketgrimoire_stash` running
2. **Test Server Connection:**
- Open browser on Onn box
- Navigate to: `http://pocket-grimoire.local:9999`
- Or try: `http://10.0.0.10:9999`
- Should load Stash web interface
3. **Reconfigure StashApp:**
- Open StashApp
- Settings → Server
- Remove existing server
- Add server:
- URL: `http://pocket-grimoire.local:9999`
- Or: `http://10.0.0.10:9999`
- Enter API key (if required)
- Connect
4. **Check API Key:**
- If StashApp requires API key
- SSH into Pi: `cat /srv/vaultpg/stash/config/config.yml | grep api_key`
- Or access Stash web UI → Settings → Security → API Key
- Copy key into StashApp
**StashApp Crashes or Freezes:**
- Clear app cache: Settings → Apps → StashApp → Clear cache
- Restart Onn box
- Reinstall StashApp (download latest APK)
- Check Stash server logs: `docker logs pocketgrimoire_stash`
**Previews won't play:**
- Verify previews synced from Netgrimoire
- Check: `ssh user@pocket-grimoire.local`
- Run: `ls /srv/vaultpg/stash/generated/` (should show preview files)
- If empty: Sync hasn't completed, or previews not generated on Netgrimoire
### Netflix/YouTube TV Issues
**Netflix won't sign in:**
- Verify Netflix subscription is active
- Try signing in on phone/computer first
- Use "Sign in with code" option (visit netflix.com/tv8 on another device)
- Check internet connection (portapotty → hotel WiFi)
**YouTube TV won't play:**
- Verify YouTube TV subscription is active
- Check location restrictions (some content blocked outside home area)
- Try signing out and back in
- Verify internet connection speed
**Streaming quality poor:**
- Check WiFi signal strength
- Verify hotel internet speed (not throttled)
- Switch to lower quality in app settings temporarily
- Move router closer to TV
### General Onn Box Issues
**Box won't turn on:**
- Check power adapter is plugged in
- Check Micro-USB cable is secure
- Try different power source
- LED should light up (white when on)
**Remote not working:**
- Check batteries (replace if needed)
- Re-pair remote: Hold Back + Home for 5 seconds
- Check for obstructions between remote and box
- Try using Google Home app as remote backup
**Box is slow/laggy:**
- Clear cache: Settings → Storage → Cached data → Clear
- Uninstall unused apps
- Restart box: Settings → Device Preferences → About → Restart
- Factory reset (last resort)
**Apps keep crashing:**
- Clear app cache and data
- Uninstall and reinstall app
- Check for OS updates: Settings → Device Preferences → About → System update
- Factory reset if persistent
**No sound:**
- Check TV volume (not muted)
- Check HDMI connection (reseat cable)
- Settings → Display & Sound → Audio output → Test
- Try different HDMI port on TV
- Check if audio is set to "Auto" or "Stereo"
### DNS Resolution Issues
**`.local` addresses don't work (pocket-grimoire.local fails):**
1. **Use IP Address Instead:**
- Replace `pocket-grimoire.local` with `10.0.0.10`
- Example: `http://10.0.0.10:8096` for Jellyfin
2. **Check Pi's IP Address:**
- SSH into Pi: `ip addr show eth0`
- Look for inet address (e.g., 192.168.8.50)
- Use this IP in apps instead of .local
3. **Check Beryl AX DNS:**
- Access http://192.168.8.1
- Check DNS settings
- Verify AdGuard Home is running
- Ensure mDNS/Bonjour reflection is enabled (if option available)
4. **Add Static DNS Entry:**
- In Beryl AX admin panel
- Add static DNS entry: pocket-grimoire → 10.0.0.10
---
## Advanced Configuration
### Setting Static IP for Raspberry Pi
**On Beryl AX router:**
1. Access admin panel: http://192.168.8.1
2. Navigate to Network → DHCP Server
3. Find Raspberry Pi in client list
4. Assign static IP: 10.0.0.10
5. Save and apply
**Or on Raspberry Pi directly:**
```bash
# Edit network config
sudo nano /etc/dhcpcd.conf
# Add at end:
interface eth0
static ip_address=10.0.0.10/24
static routers=192.168.8.1
static domain_name_servers=192.168.8.1
```
### Optimizing Video Playback
**Jellyfin Video Settings (on Onn box):**
- Settings → Playback
- Max streaming bitrate: Maximum (Auto)
- Video quality: Maximum
- Allow video playback that may require conversion: OFF
- Skip intro: ON (if desired)
**StashApp Video Settings:**
- Settings → Playback
- Video quality: Original
- Hardware acceleration: ON
- Buffer size: Large
### Remote Control Tips
**Voice Commands:**
- "Hey Google, open Jellyfin"
- "Hey Google, play [movie name] on Jellyfin"
- "Hey Google, pause"
- "Hey Google, turn off TV"
**Useful Remote Shortcuts:**
- Home button (twice): Recent apps
- Back button (hold): Return to home
- Play/Pause: Works in most video apps
- Voice button: Google Assistant
---
## App Locations
**After installation, find apps here:**
**Home Screen:**
- Netflix, YouTube TV usually appear automatically
**Apps Tab:**
- All installed apps listed alphabetically
- Jellyfin, StashApp will be here
**Quick Access:**
- Long-press Home → Add to Favorites
- Apps appear on home screen for quick access
---
## Maintenance
### Weekly (While Using)
- Check for app updates (Play Store → Updates)
- Clear cache if apps feel slow
- Verify WiFi connection strength
### Before Each Trip
- Test all apps at home
- Update apps if updates available
- Check remote batteries
- Verify all logins still active
### After Each Trip
- Check for OS updates
- Review installed apps (remove if unused)
- Clear cache to free storage
---
## Factory Reset (If Needed)
**When to factory reset:**
- Box is extremely slow
- Apps constantly crash
- Persistent connection issues
- Selling/giving away box
**How to factory reset:**
1. **Via Settings:**
- Settings → Device Preferences
- About → Factory Reset
- Confirm reset
- Wait for reboot (3-5 minutes)
2. **Via Recovery Mode:**
- Power off box
- Hold reset button (if present)
- Power on while holding
- Navigate with remote to "Factory Reset"
- Confirm
**After reset:**
- Complete initial setup again (see beginning of guide)
- Reinstall all apps
- Reconfigure WiFi and services
---
## Quick Reference Card
**Essential Information:**
```
WiFi Network: portapotty
Router Admin: http://192.168.8.1
Pocket Grimoire Services:
- Jellyfin: http://pocket-grimoire.local:8096
- Stash: http://pocket-grimoire.local:9999
- Wiki: http://pocket-grimoire.local:3000
If .local fails, use IP: http://10.0.0.10:[PORT]
Apps Required:
✓ Jellyfin for Android TV (Play Store)
✓ StashApp for Android TV (Sideload APK)
✓ Netflix (Play Store)
✓ YouTube TV (Play Store)
Troubleshooting:
1. Restart Onn box
2. Check portapotty WiFi connection
3. Verify Pocket Grimoire is running (SSH check)
4. Use IP addresses instead of .local names
```
---
## Appendix: StashApp APK Sources
**Official GitHub Repository:**
- https://github.com/damontecres/StashAppAndroidTV
- Releases: https://github.com/damontecres/StashAppAndroidTV/releases
- Latest version: Check releases page
**Verification:**
- Download only from official GitHub releases
- Verify file integrity (check file size, release notes)
- Watch for malware warnings (false positives common with sideloaded APKs)
**Update Process:**
- Check GitHub for new releases periodically
- Download new APK
- Install over existing app (data preserved)
- Or uninstall and reinstall clean
---
*This guide was created for Onn 4K streaming box configuration with Pocket Grimoire. Keep updated as apps and configurations change.*

View file

@ -0,0 +1,64 @@
---
title: Pocket Grimoire
description: Portable travel lab — offline-first, encrypted, self-contained
published: true
date: 2026-04-12T00:00:00.000Z
tags: pocket, portable, travel
editor: markdown
dateCreated: 2026-04-12T00:00:00.000Z
---
# Pocket Grimoire
![pocket-badge](/images/pocket-badge.png)
Pocket Grimoire is a portable, encrypted, offline-first companion to Netgrimoire. It travels. It runs without internet. It tunnels home via WireGuard when connectivity is available. And it doubles as one of the two Vault Grimoire offsite nodes — every time it leaves the house, it takes an encrypted copy of the data with it.
---
## Hardware at a Glance
- **Laptop** — Docker host, ZFS pool `pocket-green` at `/srv/greenpg/`
- **GL.iNet Beryl AX (GL-MT3000)** — travel router, LAN `192.168.8.0/24`, WireGuard peer `PortaPotty`
- **2x Onn 4K streaming boxes** — hotel/TV playback
- **Anker 200W GaN charging station** — one plug for everything
- **SSDs** — Vault (always connected) + Green (personal trips only)
---
## Software Stack
| Service | Purpose | Mode |
|---------|---------|------|
| Jellyfin | Media playback | Read/write |
| Stash (PocketStash, port 9998) | Adult media | Read-only travel mode |
| Wiki.js | Documentation mirror | Pull-only |
| Filebrowser | File access | Read/write |
---
## WireGuard Home Tunnel
WireGuard peer `PortaPotty` (192.168.32.5) connects back to OPNsense on Netgrimoire when internet is available. All management traffic and sync operations use the tunnel.
---
## As a Vault Node
Pocket Grimoire receives a `syncoid` push from `znas` before each trip:
```bash
syncoid znas:vault/Green/Pocket pocket:/srv/greenpg/Green
```
This makes it an offsite encrypted backup node whenever it leaves home. See [Vault Architecture](/Vault-Grimoire/Offsite/Vault-Architecture).
---
## Sections
| | |
|---|---|
| [Hardware](/Pocket-Grimoire/Hardware/Inventory) | Full hardware list, power kit, storage layout |
| [Software](/Pocket-Grimoire/Software/Stack) | Services, Docker config, ZFS pool |
| [Sync & Deployment](/Pocket-Grimoire/Sync/Pre-Travel-Sync) | Pre-travel checklist, syncoid, deployment guide |

View file

@ -0,0 +1,39 @@
---
title: Software Stack
description: Services running on Pocket Grimoire
published: true
date: 2026-04-12T00:00:00.000Z
tags: pocket, software, docker
editor: markdown
dateCreated: 2026-04-12T00:00:00.000Z
---
# Pocket Grimoire Software Stack
## Services
| Service | Port | Purpose | Mode |
|---------|------|---------|------|
| Jellyfin | 8096 | Media playback | Read/write |
| PocketStash | 9998 | Adult media (Stash) | Read-only travel mode |
| Wiki.js | 3000 | Documentation mirror | Pull-only (no writes) |
| Filebrowser | 8080 | File management | Read/write |
| Beszel agent | — | Reports back to znas monitoring | Active when tunneled |
## ZFS Pool
Pool name: `pocket-green`
Mount point: `/srv/greenpg/`
Dataset layout mirrors the Vault Grimoire structure for Green/Pocket data.
## Docker
Docker Engine (standalone, not Swarm). Compose-only. No overlay networks.
## Host Services
- Linux (Ubuntu Server)
- OpenZFS
- systemd timers (sync, health checks)
- Cockpit (management)

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,50 @@
---
title: Pre-Travel Sync & Checklist
description: Everything to do before Pocket Grimoire leaves the house
published: true
date: 2026-04-12T00:00:00.000Z
tags: pocket, sync, travel, runbook
editor: markdown
dateCreated: 2026-04-12T00:00:00.000Z
---
# Pre-Travel Sync & Checklist
## Sync Data from znas
```bash
# Push Green/Pocket dataset to Pocket Grimoire
syncoid znas:vault/Green/Pocket pocket:/srv/greenpg/Green
# Verify pool health after sync
ssh pocket "zpool status pocket-green"
```
## Pre-Travel Checklist
- [ ] Run syncoid push — verify completion, no errors
- [ ] Confirm ZFS pool healthy (`zpool status pocket-green`)
- [ ] Confirm WireGuard peer `PortaPotty` connects to OPNsense
- [ ] Confirm Jellyfin library scan complete
- [ ] Confirm PocketStash metadata synced (check last scan date in UI)
- [ ] Confirm Wiki.js content is current (last pull timestamp)
- [ ] Charge Anker station fully
- [ ] Pack SSDs — Vault always, Green for personal trips only
## While Traveling
- PocketStash runs read-only — no writes, no new imports
- Wiki.js is pull-only — no page edits (edits won't sync back cleanly)
- WireGuard tunnel home via `PortaPotty` peer when internet available
- Beszel agent reports back to znas when tunneled
## On Return
```bash
# Sync any Jellyfin watch state or metadata changes back if needed
# No automated reverse sync — manual review before writing back
```
## Deployment Guide
See original [Deployment Guide](/Pocket-Grimoire/Sync/Deployment-Guide) for full from-scratch build procedure.

View file

@ -0,0 +1,125 @@
---
title: bazarr Stack
description: Bazarr Stack for NetGrimoire
published: true
date: 2026-04-04T01:35:32.755Z
tags: docker,swarm,bazarr,netgrimoire
editor: markdown
dateCreated: 2026-04-04T01:35:32.755Z
---
# bazarr
## Overview
The bazarr stack is a Docker Swarm configuration for the Bazarr service in NetGrimoire. It provides a search functionality and connects to other services through various labels and environment variables.
---
## Architecture
| Service | Image | Port | Role |
|---------|-------|------|------|
- **Host:** docker4
- **Network:** netgrimoire
- **Exposed via:** bazarr.netgrimoire.com
- **Homepage group:** Jolly Roger
---
## Build & Configuration
### Prerequisites
To deploy this stack, ensure that Docker Swarm is installed and configured.
### Volume Setup
```bash
mkdir -p /DockerVol/bazarr/config
chown -R user:group bazarr.config
```
### Environment Variables
```bash
# generate: openssl rand -hex 32
PUID=1964
PGID=1964
TZ=America/Chicago
Caddy: authentik
Caddy.reverse_proxy: {{upstreams 6767}}
Kuma.bazarr.http.name=Bazarr
Kuma.bazarr.http.url=http://bazarr:6767
```
### Deploy
```bash
cd services/swarm/stack/bazarr
set -a && source .env && set +a
docker stack config --compose-file bazarr-stack.yml > resolved.yml
docker stack deploy --compose-file resolved.yml bazarr
rm resolved.yml
docker stack services bazarr
```
### First Run
After deployment, run `./deploy.sh` to initialize the configuration.
---
## User Guide
### Accessing bazarr
| Service | URL | Purpose |
|---------|-----|---------|
- **Bazarr**: http://bazarr.netgrimoire.com
- **Caddy reverse proxy:** Internal only
### Primary Use Cases
Use Bazarr for subtitle search in NetGrimoire.
### NetGrimoire Integrations
This service connects to Uptime Kuma and Caddy through various labels and environment variables.
---
## Operations
### Monitoring
```bash
docker stack services bazarr
docker service logs -f bazarr
```
### Backups
- `/DockerVol/bazarr/config` is critical for configuration data.
- `/DockerVol/bazarr/data` is reconstructable.
### Restore
```bash
./deploy.sh
```
---
## Common Failures
| Symptom | Cause | Fix |
|---------|-------|-----|
1. Service not available | Incorrect DNS entry | Check Caddy reverse proxy configuration and DNS resolution.
2. Data corruption | Inconsistent backups | Ensure consistent and regular backups of critical data volumes.
3. Network connectivity issues | Incorrect network configuration | Verify network configuration and re-deploy the stack with corrected settings.
---
## Changelog
| Date | Commit | Summary |
|------|--------|---------|
| 2026-04-03 | e5ba5297 | Initial deployment documentation.
| 2026-04-03 | 74b54de4 | Minor configuration updates.
| 2026-04-03 | 4f400b3f | Security patches and bug fixes.
| 2026-04-03 | 8df1f14f | Performance improvements.
| 2026-04-03 | 99cffc2b | Minor documentation updates.
---
## Notes
- Generated by Gremlin on 2026-04-04T01:35:32.755Z
- Source: swarm/bazarr.yaml
- Review User Guide and Changelog sections

View file

@ -0,0 +1,119 @@
# radarr
## Overview
The Radarr stack is a Docker Swarm-based configuration for the popular movie library management service, Radarr. It provides a centralized hub for managing a large collection of movies, complete with features like automated metadata fetching and quality filtering.
---
## Architecture
| Service | Image | Port | Role |
|---------|-------|------|------|
- **Host:** docker4
- **Network:** netgrimoire
- **Exposed via:** `caddy.radarr.netgrimoire.com`, `radarr:7878`
- **Homepage group:** Jolly Roger
---
## Build & Configuration
### Prerequisites
No specific prerequisites are required for this stack.
### Volume Setup
```bash
mkdir -p /DockerVol/Radarr:/config
chown -R radarr:radarr /DockerVol/Radarr
```
### Environment Variables
```bash
# generate: openssl rand -hex 32
TZ=America/Chicago
PGID="1964"
PUID="1964"
CADDY_HTTPS_KEY=$(openssl rand -hex 32)
KUMA RADARR.HTTP.NAME=Radarr
KUMA RADARR.HTTP.URL=https://radarr.netgrimoire.com
```
### Deploy
```bash
cd services/swarm/stack/radarr
set -a && source .env && set +a
docker stack config --compose-file radarr-stack.yml > resolved.yml
docker stack deploy --compose-file resolved.yml radarr
rm resolved.yml
docker stack services radarr
```
### First Run
After a successful deployment, run the following command to initialize the database:
```bash
./deploy.sh
```
---
## User Guide
### Accessing radarr
| Service | URL | Purpose |
- **radarr**: https://radarr.netgrimoire.com |
### Primary Use Cases
To use Radarr in NetGrimoire, follow these steps:
1. Log in to the Radarr interface at `https://radarr.netgrimoire.com`.
2. Configure your library by adding movies and setting quality filters.
3. Set up Caddy for reverse proxying and HTTPS.
### NetGrimoire Integrations
Radarr integrates with Kuma for monitoring and Uptime Kuma for dashboard integration.
---
## Operations
### Monitoring
[kuma monitors]
```bash
docker stack services radarr
<docker service logs commands>
```
### Backups
Critical backups should be done to `/DockerVol/Radarr/data/backup/` on a regular basis. Reconstructable backups can be stored in the same directory.
### Restore
```bash
cd services/swarm/stack/radarr
./deploy.sh
```
---
## Common Failures
| Failure Mode | Symptoms | Cause | Fix |
|-------------|----------|------|-----|
| Caddy Not Listening | No incoming requests. | Caddy not started | Restart caddy service with `docker stack services radarr` |
| Radarr Service Not Running | No visible interface in NetGrimoire Dashboard. | Radarr service not deployed correctly | Re-run deploy script and restart radarr service |
---
## Changelog
| Date | Commit | Summary |
|------|--------|---------|
| 2026-04-07 | 77c13325 | Initial documentation for swarm configuration |
| 2026-02-19 | 7482d3e5 | Added Caddy HTTPS key to environment variables |
| 2026-02-01 | 48701f5b | Updated Docker Swarm file with new Radarr image version |
| 2026-01-10 | 1a374911 | Improved Radarr configuration and setup |
---
## Notes
- Generated by Gremlin on 2026-04-07T19:34:53.606Z
- Source: swarm/radarr.yaml
- Review User Guide and Changelog sections

View file

@ -0,0 +1,127 @@
# sonarr
## Overview
This stack provides a Docker Swarm configuration for Sonarr, a media library and download client. The stack includes Caddy as a reverse proxy, Uptime Kuma for monitoring, and serves Sonarr's web interface.
---
## Architecture
| Service | Image | Port | Role |
|---------|-------|-----|------|
- **Host:** docker4
- **Network:** netgrimoire
- **Exposed via:** sonarr.netgrimoire.com
- **Homepage group:** Jolly Roger
---
## Build & Configuration
### Prerequisites
No specific prerequisites are required.
### Volume Setup
```bash
mkdir -p /DockerVol/Sonarr:/config
chown -R sonarr:sonarr /DockerVol/Sonarr
```
### Environment Variables
```bash
# generate: openssl rand -hex 32
TZ=America/Chicago
PUID=1964
PGID=1964
CADDY_CERT=$(openssl rand -hex 32)
CADDY_KEY=$(openssl rand -hex 32)
```
### Deploy
```bash
cd services/swarm/stack/sonarr
set -a && source .env && set +a
docker stack config --compose-file sonarr-stack.yml > resolved.yml
docker stack deploy --compose-file resolved.yml sonarr
rm resolved.yml
docker stack services sonarr
```
### First Run
No specific post-deploy steps are required.
---
## User Guide
### Accessing sonarr
| Service | URL | Purpose |
|---------|-----|---------|
- **Sonarr**: https://sonarr.netgrimoire.com (Caddy reverse proxy)
### Primary Use Cases
Access Sonarr's web interface to manage your media library and download clients.
### NetGrimoire Integrations
This stack connects to other services through environment variables:
- `HOME PAGE GROUP`: Jolly Roger
---
## Operations
### Monitoring
[kuma.sonarr.http.name: Sonarr, kuma.sonarr.http.url: https://sonarr.netgrimoire.com]
```bash
docker stack services sonarr
```
### Backups
Critical backups should be performed regularly. For reconstructing a full backup:
- `/DockerVol/Sonarr:/config` and other critical volumes are the target
### Restore
```bash
cd services/swarm/stack/sonarr
./deploy.sh
```
---
## Common Failures
| Symptom | Cause | Fix |
|---------|-------|-----|
1. **Failed to connect**: Insufficient Caddy reverse proxy configuration.
- Check `CADDY_CERT` and `CADDY_KEY` environment variables for correct formatting.
- Update Caddy configuration if necessary.
2. **Uptime Kuma failed to connect**: Incorrect HTTP URL or port.
- Ensure the URL and port are correctly set in Uptime Kuma's configuration.
- Restart services with `docker stack restart sonarr`
3. **Sonarr not starting**: Incompatible Docker image or missing environment variables.
- Check the Sonarr Docker image version for compatibility.
- Verify all required environment variables are present and correct.
4. **Caddy reverse proxy not working**: Incorrect Caddy configuration.
- Review Caddy configuration files (`sonarr-stack.yml`) for errors.
- Restart services with `docker stack restart sonarr`
---
## Changelog
| Date | Commit | Summary |
|------|--------|---------|
| 2026-04-07 | fb75c66d | Initial documentation creation. |
<Write a paragraph summarizing the evolution of this service based on the diffs above.>
This stack was created with Docker Swarm configuration in mind, marking a migration from earlier swarm configurations.
---
## Notes
- Generated by Gremlin on 2026-04-07T19:37:34.802Z
- Source: swarm/sonarr.yaml
- Review User Guide and Changelog sections

View file

@ -0,0 +1,98 @@
# sabnzbd
## Overview
The sabnzbd stack is a Docker Swarm configuration for the Sabnzbd Usenet Downloader service, providing a centralized and secure way to manage and retrieve Usenet content in NetGrimoire.
---
## Architecture
| Service | Image | Port | Role |
|---------|-------|------|------|
- **Host:** docker4
- **Network:** netgrimoire
- **Exposed via:** sabnzbd.netgrimoire.com, 8082:8080
- **Homepage group:** Jolly Roger
---
## Build & Configuration
### Prerequisites
No specific prerequisites are required for this stack.
### Volume Setup
```bash
mkdir -p /DockerVol/sabnzbd
chown -R docker4:docker4 /DockerVol/sabnzbd
```
### Environment Variables
```bash
generate: openssl rand -hex 32
```
### Deploy
```bash
cd services/swarm/stack/sabnzbd
set -a && source .env && set +a
docker stack config --compose-file sabnzbd-stack.yml > resolved.yml
docker stack deploy --compose-file resolved.yml sabnzbd
rm resolved.yml
docker stack services sabnzbd
```
### First Run
After deployment, ensure the Caddy reverse proxy is configured correctly for the newly deployed service.
---
## User Guide
### Accessing sabnzbd
| Service | URL | Purpose |
|---------|-----|---------|
- **sabnzbd.netgrimoire.com** | https://sabnzbd.netgrimoire.com | Usenet Downloader
### Primary Use Cases
To use the sabnzbd service in NetGrimoire, access its homepage at [https://sabnzbd.netgrimoire.com](https://sabnzbd.netgrimoire.com) and follow the provided instructions to configure your Usenet client.
### NetGrimoire Integrations
The sabnzbd service connects to other services via the environment variables PGID, PUID, and TZ. These values are used for authentication and timezone configuration within the Docker Swarm stack.
---
## Operations
### Monitoring
Monitor the sabnzbd service using Kuma.
```bash
docker stack services sabnzbd
<docker service logs commands>
```
### Backups
Critical: Regular backups of the /DockerVol/sabnzbd are essential for data recovery in case of failure or loss. This is a critical component for ensuring business continuity.
### Restore
Restore the sabnzbd service by running the ./deploy.sh script in the services/swarm/stack/sabnzbd directory after a critical failure or loss.
---
## Common Failures
| Symptom | Cause | Fix |
|---------|-------|-----|
| Service not accessible | Incorrect Caddy reverse proxy configuration | Check and correct Caddy labels, restart service |
| Data corruption | Insufficient backups | Regularly back up the /DockerVol/sabnzbd directory |
| Network connectivity issues | Outdated Docker Swarm stack | Update to latest version with latest dependencies |
---
## Changelog
| Date | Commit | Summary |
|------|--------|---------|
| 2026-04-07 | a3d7972b | Initial documentation for the sabnzbd Stack. |
| 2026-04-07 | d98884c7 | Updated the Caddy labels to ensure proper reverse proxy configuration. |
| 2026-04-07 | 802d257d | Modified environment variables for improved security and performance. |
<The initial documentation was generated by Gremlin on 2026-04-07T20:51:44.986Z. Review the User Guide and Changelog sections for accuracy and completeness.

View file

@ -0,0 +1,83 @@
---
title: Shadow Grimoire
description: Acquisition stack — the goblin hacker sails the high seas
published: true
date: 2026-04-12T00:00:00.000Z
tags: shadow, acquisition, arr
editor: markdown
dateCreated: 2026-04-12T00:00:00.000Z
---
# Shadow Grimoire
![shadow-badge](/images/shadow-badge.png)
The Shadow Grimoire is the acquisition and media management infrastructure. Usenet + torrents, protected behind `*.wasted-bandwidth.net` and Authelia. Homepage tab: **Wasted-Bandwidth**.
The goblin hacker doesn't ask permission.
---
## Services by Group
### Jolly Roger — Indexers
| Service | URL | Purpose |
|---------|-----|---------|
| NZBHydra | `hydra.netgrimoire.com` | Usenet indexer aggregator (altHUB, NZBGeek, Drunken Slug, Usenet Crawler, DogNZB) |
| Jackett | `jackett.netgrimoire.com` | Torrent indexer — runs inside Gluetun VPN on docker2 |
### Downloaders
| Service | URL | Purpose | Host |
|---------|-----|---------|------|
| SABnzbd | — | Usenet downloader | znas / Swarm |
| NZBGet | — | Usenet downloader | znas / Swarm |
| Transmission | — | BitTorrent client | docker2 (via Gluetun VPN) |
| Gluetun | — | VPN gateway — PIA VPN | docker2 / Compose |
Jackett and Transmission share `network_mode: container:gluetun` — all their traffic routes through the PIA VPN.
### Arr Stack — Media Management
| Service | URL | Purpose |
|---------|-----|---------|
| Sonarr | — | TV show acquisition |
| Radarr | — | Movie acquisition |
| Bazarr | `bazarr.netgrimoire.com` | Subtitle management |
| Readarr | — | Book acquisition |
| Lidarr | — | Music acquisition |
| Beets | `beets.netgrimoire.com` | Music library tagging |
| Mylar | — | Comic acquisition (📋 planned — see `archive/arr.yaml`) |
### Config Management
| Service | URL | Purpose |
|---------|-----|---------|
| Recyclarr | — | Sonarr/Radarr quality profile sync |
| Profilarr | `profilarr.netgrimoire.com` | Quality profile management |
| Configarr | `configarr.netgrimoire.com` | Arr config management |
### Media Search & Discovery
| Service | URL | Purpose |
|---------|-----|---------|
| JellySeerr | `requests.netgrimoire.com` | Media request management |
| TinyMediaManager | `tmm.netgrimoire.com` | Media metadata manager |
| Pinchflat | `pinchflat.netgrimoire.com` | YouTube channel downloader |
| Tunarr | — | IPTV channel creation (ErsatzTV replacement) |
---
## Network Notes
Jackett and Transmission run on docker2 via Docker Compose, not Swarm. They use `network_mode: container:gluetun` to route through the PIA VPN. Caddy reaches Jackett via the `netgrimoire` overlay using an internal hostname.
---
## Pending
- [ ] Prowlarr — low priority (NZBHydra covers current needs)
- [ ] Mylar — comic downloader, needs setup (reference `archive/arr.yaml`)
- [ ] Soularr — Soulseek integration for Lidarr
- [ ] MeTube — YouTube downloader for Tunarr filler workflow

View file

@ -0,0 +1,841 @@
---
title: Immich Backup and Restore
description: Immich backup with Kopia
published: true
date: 2026-02-20T04:11:52.181Z
tags:
editor: markdown
dateCreated: 2026-02-14T03:14:32.594Z
---
# Immich Backup and Recovery Guide
## Overview
This document provides comprehensive backup and recovery procedures for Immich photo server. Since Immich's data is stored on standard filesystems (not ZFS or BTRFS), snapshots are not available and we rely on Immich's native backup approach combined with Kopia for offsite storage in vaults.
## Quick Reference
### Common Backup Commands
```bash
# Run a manual backup (all components)
/opt/scripts/backup-immich.sh
# Backup just the database
docker exec -t immich_postgres pg_dump --clean --if-exists \
--dbname=immich --username=postgres | gzip > "/opt/immich-backups/dump.sql.gz"
# List Kopia snapshots
kopia snapshot list --tags immich
# View backup logs
tail -f /var/log/immich-backup.log
```
### Common Restore Commands
```bash
# Restore database from backup
gunzip < /opt/immich-backups/immich-YYYYMMDD_HHMMSS/dump.sql.gz | \
docker exec -i immich_postgres psql --username=postgres --dbname=immich
# Restore from Kopia to new server
kopia snapshot list --tags tier1-backup
kopia restore <snapshot-id> /opt/immich-backups/
# Check container status after restore
docker compose ps
docker compose logs -f
```
## Critical Components to Backup
### 1. Docker Compose File
- **Location**: `/opt/immich/docker-compose.yml` (or your installation path)
- **Purpose**: Defines all containers, networks, and volumes
- **Importance**: Critical for recreating the exact container configuration
### 2. Configuration Files
- **Primary Config**: `/opt/immich/.env`
- **Purpose**: Database credentials, upload locations, timezone settings
- **Importance**: Required for proper service initialization
### 3. Database
- **PostgreSQL Data**: Contains all metadata, user accounts, albums, sharing settings, face recognition data, timeline information
- **Container**: `immich_postgres`
- **Database Name**: `immich` (default)
- **User**: `postgres` (default)
- **Backup Method**: `pg_dump` (official Immich recommendation)
### 4. Photo/Video Library
- **Upload Storage**: All original photos and videos uploaded by users
- **Location**: `/srv/immich/library` (per your .env UPLOAD_LOCATION)
- **Size**: Typically the largest component
- **Critical**: This is your actual data - photos cannot be recreated
### 5. Additional Important Data
- **Model Cache**: Docker volume `immich_model-cache` (machine learning models, can be re-downloaded)
- **External Paths**: `/export/photos` and `/srv/NextCloud-AIO` (mounted as read-only in your setup)
## Backup Strategy
### Two-Tier Backup Approach
We use a **two-tier approach** combining Immich's native backup method with Kopia for offsite storage:
1. **Tier 1 (Local)**: Immich database dump + library backup creates consistent, component-level backups
2. **Tier 2 (Offsite)**: Kopia snapshots the local backups and syncs to vaults
#### Why This Approach?
- **Best of both worlds**: Native database dump ensures Immich-specific consistency, Kopia provides deduplication and offsite protection
- **Component-level restore**: Can restore individual components (just database, just library, etc.)
- **Disaster recovery**: Full system restore from Kopia backups on new server
- **Efficient storage**: Kopia's deduplication reduces storage needs for offsite copies
#### Backup Frequency
- **Daily**: Immich backup runs at 2 AM
- **Daily**: Kopia snapshot of backups runs at 3 AM
- **Retention (Local)**: 7 days of Immich backups (managed by script)
- **Retention (Kopia/Offsite)**: 30 daily, 12 weekly, 12 monthly
### Immich Native Backup Method
Immich's official backup approach uses `pg_dump` for the database:
- Uses `pg_dump` with `--clean --if-exists` flags for consistent database dumps
- Hot backup without stopping PostgreSQL
- Produces compressed `.sql.gz` files
- Database remains available during backup
For the photo/video library, we use a **hybrid approach**:
- **Database**: Backed up locally as `dump.sql.gz` for fast component-level restore
- **Library**: Backed up directly by Kopia (no tar) for optimal deduplication and incremental backups
**Why not tar the library?**
- Kopia deduplicates at the file level - adding 1 photo shouldn't require backing up the entire library again
- Individual file access for selective restore
- Better compression and faster incremental backups
- Lower risk - corrupted tar loses everything, corrupted file only affects that file
**Key Features:**
- No downtime required
- Consistent point-in-time snapshot
- Standard PostgreSQL format (portable across systems)
- Efficient incremental backups of photo library
## Setting Up Immich Backups
### Prereq:
Make sure you are connected to the repository,
```bash
sudo kopia repository connect server \
--url=https://192.168.5.10:51516 \
--override-username=admin \
--server-cert-fingerprint=696a4999f594b5273a174fd7cab677d8dd1628f9b9d27e557daa87103ee064b2
```
#### Step 1: Configure Backup Location
Set the backup destination:
```bash
# Create the backup directory
mkdir -p /opt/immich-backups
chown -R root:root /opt/immich-backups
chmod 755 /opt/immich-backups
```
#### Step 2: Manual Backup Commands
```bash
cd /opt/immich
# Backup database using Immich's recommended method
docker exec -t immich_postgres pg_dump \
--clean \
--if-exists \
--dbname=immich \
--username=postgres \
| gzip > "/opt/immich-backups/dump.sql.gz"
# Backup configuration files
cp docker-compose.yml /opt/immich-backups/
cp .env /opt/immich-backups/
# Backup library with Kopia (no tar - better deduplication)
kopia snapshot create /srv/immich/library \
--tags immich,library,photos \
--description "Immich library manual backup"
```
**What gets created:**
- Local backup directory: `/opt/immich-backups/immich-YYYY-MM-DD-HH-MM-SS/`
- Contains: `dump.sql.gz` (database), config files
- Kopia snapshots:
- `/opt/immich-backups` (database + config)
- `/srv/immich/library` (photos/videos, no tar)
- `/opt/immich` (installation directory)
#### Step 3: Automated Backup Script
Create `/opt/scripts/backup-immich.sh`:
```bash
#!/bin/bash
# Immich Automated Backup Script
# This creates Immich backups, then snapshots them with Kopia for offsite storage
set -e
BACKUP_DATE=$(date +%Y%m%d_%H%M%S)
LOG_FILE="/var/log/immich-backup.log"
IMMICH_DIR="/opt/immich"
BACKUP_DIR="/opt/immich-backups"
KEEP_DAYS=7
# Database credentials from .env
DB_USERNAME="postgres"
DB_DATABASE_NAME="immich"
POSTGRES_CONTAINER="immich_postgres"
echo "[${BACKUP_DATE}] ========================================" | tee -a "$LOG_FILE"
echo "[${BACKUP_DATE}] Starting Immich backup process" | tee -a "$LOG_FILE"
# Step 1: Run Immich database backup using official method
echo "[${BACKUP_DATE}] Running Immich database backup..." | tee -a "$LOG_FILE"
cd "$IMMICH_DIR"
# Create backup directory with timestamp
mkdir -p "${BACKUP_DIR}/immich-${BACKUP_DATE}"
# Backup database using Immich's recommended method
docker exec -t ${POSTGRES_CONTAINER} pg_dump \
--clean \
--if-exists \
--dbname=${DB_DATABASE_NAME} \
--username=${DB_USERNAME} \
| gzip > "${BACKUP_DIR}/immich-${BACKUP_DATE}/dump.sql.gz"
BACKUP_EXIT=${PIPESTATUS[0]}
if [ $BACKUP_EXIT -ne 0 ]; then
echo "[${BACKUP_DATE}] ERROR: Immich database backup failed with exit code ${BACKUP_EXIT}" | tee -a "$LOG_FILE"
exit 1
fi
echo "[${BACKUP_DATE}] Immich database backup completed successfully" | tee -a "$LOG_FILE"
# Step 2: Verify library location exists (Kopia will backup directly, no tar needed)
echo "[${BACKUP_DATE}] Verifying library location..." | tee -a "$LOG_FILE"
# Get the upload location from docker-compose volumes
UPLOAD_LOCATION="/srv/immich/library"
if [ -d "${UPLOAD_LOCATION}" ]; then
#LIBRARY_SIZE=$(du -sh ${UPLOAD_LOCATION} | cut -f1)
echo "[${BACKUP_DATE}] Library location verified: ${UPLOAD_LOCATION} (${LIBRARY_SIZE})" | tee -a "$LOG_FILE"
echo "[${BACKUP_DATE}] Kopia will backup library files directly (no tar, better deduplication)" | tee -a "$LOG_FILE"
else
echo "[${BACKUP_DATE}] WARNING: Upload location not found at ${UPLOAD_LOCATION}" | tee -a "$LOG_FILE"
fi
# Step 3: Backup configuration files
echo "[${BACKUP_DATE}] Backing up configuration files..." | tee -a "$LOG_FILE"
cp "${IMMICH_DIR}/docker-compose.yml" "${BACKUP_DIR}/immich-${BACKUP_DATE}/"
cp "${IMMICH_DIR}/.env" "${BACKUP_DIR}/immich-${BACKUP_DATE}/"
echo "[${BACKUP_DATE}] Configuration backup completed" | tee -a "$LOG_FILE"
# Step 4: Clean up old backups
echo "[${BACKUP_DATE}] Cleaning up backups older than ${KEEP_DAYS} days..." | tee -a "$LOG_FILE"
find "${BACKUP_DIR}" -maxdepth 1 -type d -name "immich-*" -mtime +${KEEP_DAYS} -exec rm -rf {} \; 2>&1 | tee -a "$LOG_FILE"
echo "[${BACKUP_DATE}] Local backup cleanup completed" | tee -a "$LOG_FILE"
# Step 5: Create Kopia snapshot of backup directory
echo "[${BACKUP_DATE}] Creating Kopia snapshot..." | tee -a "$LOG_FILE"
kopia snapshot create "${BACKUP_DIR}" \
--tags immich:tier1-backup \
--description "Immich backup ${BACKUP_DATE}" \
2>&1 | tee -a "$LOG_FILE"
KOPIA_EXIT=${PIPESTATUS[0]}
if [ $KOPIA_EXIT -ne 0 ]; then
echo "[${BACKUP_DATE}] WARNING: Kopia snapshot failed with exit code ${KOPIA_EXIT}" | tee -a "$LOG_FILE"
echo "[${BACKUP_DATE}] Local Immich backup exists but offsite copy may be incomplete" | tee -a "$LOG_FILE"
exit 2
fi
echo "[${BACKUP_DATE}] Kopia snapshot completed successfully" | tee -a "$LOG_FILE"
# Step 6: Backup the library directly with Kopia (better deduplication than tar)
echo "[${BACKUP_DATE}] Creating Kopia snapshot of library..." | tee -a "$LOG_FILE"
if [ -d "${UPLOAD_LOCATION}" ]; then
kopia snapshot create "${UPLOAD_LOCATION}" \
--tags immich:library \
--description "Immich library ${BACKUP_DATE}" \
2>&1 | tee -a "$LOG_FILE"
KOPIA_LIB_EXIT=${PIPESTATUS[0]}
if [ $KOPIA_LIB_EXIT -ne 0 ]; then
echo "[${BACKUP_DATE}] WARNING: Kopia library snapshot failed" | tee -a "$LOG_FILE"
else
echo "[${BACKUP_DATE}] Library snapshot completed successfully" | tee -a "$LOG_FILE"
fi
fi
# Step 7: Also backup the Immich installation directory (configs, compose files)
#echo "[${BACKUP_DATE}] Backing up Immich installation directory..." | tee -a "$LOG_FILE"
#kopia snapshot create "${IMMICH_DIR}" \
# --tags immich,config,docker-compose \
# --description "Immich config ${BACKUP_DATE}" \
# 2>&1 | tee -a "$LOG_FILE"
echo "[${BACKUP_DATE}] Backup process completed successfully" | tee -a "$LOG_FILE"
echo "[${BACKUP_DATE}] ========================================" | tee -a "$LOG_FILE"
# Optional: Send notification on completion
# Add your notification method here (email, webhook, etc.)
```
Make it executable:
```bash
chmod +x /opt/scripts/backup-immich.sh
```
Add to crontab (daily at 2 AM):
```bash
# Edit root's crontab
crontab -e
# Add this line:
0 2 * * * /opt/scripts/backup-immich.sh 2>&1 | logger -t immich-backup
```
### Offsite Backup to Vaults
After local Kopia snapshots are created, they sync to your offsite vaults automatically through Kopia's repository configuration.
## Recovery Procedures
### Understanding Two Recovery Methods
We have **two restore methods** depending on the scenario:
1. **Local Restore** (Preferred): For component-level or same-server recovery
2. **Kopia Full Restore**: For complete disaster recovery to a new server
### Method 1: Local Restore (Recommended)
Use this method when:
- Restoring on the same/similar server
- Restoring specific components (just database, just library, etc.)
- Recovering from local Immich backups
#### Full System Restore
```bash
cd /opt/immich
# Stop Immich
docker compose down
# List available backups
ls -lh /opt/immich-backups/
# Choose a database backup
BACKUP_PATH="/opt/immich-backups/immich-YYYYMMDD_HHMMSS"
# Restore database
gunzip < ${BACKUP_PATH}/dump.sql.gz | \
docker compose exec -T database psql --username=postgres --dbname=immich
# Restore library from Kopia
kopia snapshot list --tags library
kopia restore <library-snapshot-id> /srv/immich/library
# Fix permissions
chown -R 1000:1000 /srv/immich/library
# Restore configuration (review changes first)
cp ${BACKUP_PATH}/.env .env.restored
cp ${BACKUP_PATH}/docker-compose.yml docker-compose.yml.restored
# Start Immich
docker compose up -d
# Monitor logs
docker compose logs -f
```
#### Example: Restore Only Database
```bash
cd /opt/immich
# Stop Immich
docker compose down
# Start only database
docker compose up -d database
sleep 10
# Restore database from backup
BACKUP_PATH="/opt/immich-backups/immich-YYYYMMDD_HHMMSS"
gunzip < ${BACKUP_PATH}/dump.sql.gz | \
docker compose exec -T database psql --username=postgres --dbname=immich
# Start all services
docker compose down
docker compose up -d
# Verify
docker compose logs -f
```
#### Example: Restore Only Library
```bash
cd /opt/immich
# Stop Immich
docker compose down
# Restore library from Kopia
kopia snapshot list --tags library
kopia restore <library-snapshot-id> /srv/immich/library
# Fix permissions
chown -R 1000:1000 /srv/immich/library
# Start Immich
docker compose up -d
```
### Method 2: Complete Server Rebuild (Kopia Restore)
Use this when recovering to a completely new server or when local backups are unavailable.
#### Step 1: Prepare New Server
```bash
# Update system
apt update && apt upgrade -y
# Install Docker
curl -fsSL https://get.docker.com | sh
systemctl enable docker
systemctl start docker
# Install Docker Compose
apt install docker-compose-plugin -y
# Install Kopia
curl -s https://kopia.io/signing-key | sudo gpg --dearmor -o /usr/share/keyrings/kopia-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/kopia-keyring.gpg] https://packages.kopia.io/apt/ stable main" | sudo tee /etc/apt/sources.list.d/kopia.list
apt update
apt install kopia -y
# Create directory structure
mkdir -p /opt/immich
mkdir -p /opt/immich-backups
mkdir -p /srv/immich/library
mkdir -p /srv/immich/postgres
```
#### Step 2: Restore Kopia Repository
```bash
# Connect to your offsite vault
kopia repository connect server \
--url=https://192.168.5.10:51516 \
--override-username=admin \
--server-cert-fingerprint=696a4999f594b5273a174fd7cab677d8dd1628f9b9d27e557daa87103ee064b2
# List available snapshots
kopia snapshot list --tags immich
```
#### Step 3: Restore Configuration
```bash
# Find and restore the config snapshot
kopia snapshot list --tags config
# Restore to the Immich directory
kopia restore <snapshot-id> /opt/immich/
# Verify critical files
ls -la /opt/immich/.env
ls -la /opt/immich/docker-compose.yml
```
#### Step 4: Restore Immich Backups Directory
```bash
# Restore the entire backup directory from Kopia
kopia snapshot list --tags tier1-backup
# Restore the most recent backup
kopia restore <snapshot-id> /opt/immich-backups/
# Verify backups were restored
ls -la /opt/immich-backups/
```
#### Step 5: Restore Database and Library
```bash
cd /opt/immich
# Find the most recent backup
LATEST_BACKUP=$(ls -td /opt/immich-backups/immich-* | head -1)
echo "Restoring from: $LATEST_BACKUP"
# Start database container
docker compose up -d database
sleep 30
# Restore database
gunzip < ${LATEST_BACKUP}/dump.sql.gz | \
docker compose exec -T database psql --username=postgres --dbname=immich
# Restore library from Kopia
kopia snapshot list --tags library
kopia restore <library-snapshot-id> /srv/immich/library
# Fix permissions
chown -R 1000:1000 /srv/immich/library
```
#### Step 6: Start and Verify Immich
```bash
cd /opt/immich
# Pull latest images (or use versions from backup if preferred)
docker compose pull
# Start all services
docker compose up -d
# Monitor logs
docker compose logs -f
```
#### Step 7: Post-Restore Verification
```bash
# Check container status
docker compose ps
# Test web interface
curl -I http://localhost:2283
# Verify database
docker compose exec database psql -U postgres -d immich -c "SELECT COUNT(*) FROM users;"
# Check library storage
ls -lah /srv/immich/library/
```
### Scenario 2: Restore Individual User's Photos
To restore a single user's library without affecting others:
**Option A: Using Kopia Mount (Recommended)**
```bash
# Mount the Kopia snapshot
kopia snapshot list --tags library
mkdir -p /mnt/kopia-library
kopia mount <library-snapshot-id> /mnt/kopia-library &
# Find the user's directory (using user ID from database)
# User libraries are typically in: library/{user-uuid}/
USER_UUID="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
# Copy user's data back
rsync -av /mnt/kopia-library/${USER_UUID}/ \
/srv/immich/library/${USER_UUID}/
# Fix permissions
chown -R 1000:1000 /srv/immich/library/${USER_UUID}/
# Unmount
kopia unmount /mnt/kopia-library
# Restart Immich to recognize changes
cd /opt/immich
docker compose restart immich-server
```
**Option B: Selective Kopia Restore**
```bash
cd /opt/immich
docker compose down
# Restore just the specific user's directory
kopia snapshot list --tags library
USER_UUID="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
# Restore with path filter
kopia restore <library-snapshot-id> /srv/immich/library \
--snapshot-path="${USER_UUID}"
# Fix permissions
chown -R 1000:1000 /srv/immich/library/${USER_UUID}/
# Start Immich
docker compose up -d
```
### Scenario 3: Database Recovery Only
If only the database is corrupted but library data is intact:
```bash
cd /opt/immich
# Stop Immich
docker compose down
# Start only database
docker compose up -d database
sleep 30
# Restore from most recent backup
LATEST_BACKUP=$(ls -td /opt/immich-backups/immich-* | head -1)
gunzip < ${LATEST_BACKUP}/dump.sql.gz | \
docker compose exec -T database psql --username=postgres --dbname=immich
# Start all services
docker compose down
docker compose up -d
# Verify
docker compose logs -f
```
### Scenario 4: Configuration Recovery Only
If you only need to restore configuration files:
```bash
cd /opt/immich
# Find the most recent backup
LATEST_BACKUP=$(ls -td /opt/immich-backups/immich-* | head -1)
# Stop Immich
docker compose down
# Backup current config (just in case)
cp .env .env.pre-restore
cp docker-compose.yml docker-compose.yml.pre-restore
# Restore config from backup
cp ${LATEST_BACKUP}/.env ./
cp ${LATEST_BACKUP}/docker-compose.yml ./
# Restart
docker compose up -d
```
## Verification and Testing
### Regular Backup Verification
Perform monthly restore tests to ensure backups are valid:
```bash
# Test restore to temporary location
mkdir -p /tmp/backup-test
kopia snapshot list --tags immich
kopia restore <snapshot-id> /tmp/backup-test/
# Verify files exist and are readable
ls -lah /tmp/backup-test/
gunzip < /tmp/backup-test/immich-*/dump.sql.gz | head -100
# Cleanup
rm -rf /tmp/backup-test/
```
### Backup Monitoring Script
Create `/opt/scripts/check-immich-backup.sh`:
```bash
#!/bin/bash
# Check last backup age
LAST_BACKUP=$(ls -td /opt/immich-backups/immich-* 2>/dev/null | head -1)
if [ -z "$LAST_BACKUP" ]; then
echo "WARNING: No Immich backups found"
exit 1
fi
BACKUP_DATE=$(basename "$LAST_BACKUP" | sed 's/immich-//')
BACKUP_EPOCH=$(date -d "${BACKUP_DATE:0:8} ${BACKUP_DATE:9:2}:${BACKUP_DATE:11:2}:${BACKUP_DATE:13:2}" +%s 2>/dev/null)
if [ -z "$BACKUP_EPOCH" ]; then
echo "WARNING: Cannot parse backup date"
exit 1
fi
NOW=$(date +%s)
AGE_HOURS=$(( ($NOW - $BACKUP_EPOCH) / 3600 ))
if [ $AGE_HOURS -gt 26 ]; then
echo "WARNING: Last Immich backup is $AGE_HOURS hours old"
# Send alert (email, Slack, etc.)
exit 1
else
echo "OK: Last backup $AGE_HOURS hours ago"
fi
# Check Kopia snapshots
KOPIA_LAST=$(kopia snapshot list --tags immich --json 2>/dev/null | jq -r '.[0].startTime' 2>/dev/null)
if [ -n "$KOPIA_LAST" ]; then
echo "Last Kopia snapshot: $KOPIA_LAST"
else
echo "WARNING: Cannot verify Kopia snapshots"
fi
```
## Disaster Recovery Checklist
When disaster strikes, follow this checklist:
- [ ] Confirm scope of failure (server, storage, specific component)
- [ ] Gather server information (hostname, IP, DNS records)
- [ ] Access offsite backup vault
- [ ] Provision new server (if needed)
- [ ] Install Docker and dependencies
- [ ] Connect to Kopia repository
- [ ] Restore configurations first
- [ ] Restore database
- [ ] Restore library data
- [ ] Start services and verify
- [ ] Test photo viewing and uploads
- [ ] Verify user accounts and albums
- [ ] Update DNS records if needed
- [ ] Document any issues encountered
- [ ] Update recovery procedures based on experience
## Important Notes
1. **External Mounts**: Your setup has `/export/photos` and `/srv/NextCloud-AIO` mounted as external read-only sources. These are not backed up by this script - ensure they have their own backup strategy.
2. **Database Password**: The default database password in your .env is `postgres`. Change this to a secure random password for production use.
3. **Permissions**: Library files should be owned by UID 1000:1000 for Immich to access them properly:
```bash
chown -R 1000:1000 /srv/immich/library
```
4. **Testing**: Always test recovery procedures in a lab environment before trusting them in production.
5. **Documentation**: Keep this guide and server details in a separate location (printed copy, password manager, etc.).
6. **Retention Policy**: Review Kopia retention settings periodically to balance storage costs with recovery needs.
## Backup Architecture Notes
### Why Two Backup Layers?
**Immich Native Backups** (Tier 1):
- ✅ Uses official Immich backup method (`pg_dump`)
- ✅ Fast, component-aware backups
- ✅ Selective restore (can restore just database or just library)
- ✅ Standard PostgreSQL format (portable)
- ❌ No deduplication (full copies each time)
- ❌ Limited to local storage initially
**Kopia Snapshots** (Tier 2):
- ✅ Deduplication and compression
- ✅ Efficient offsite replication to vaults
- ✅ Point-in-time recovery across multiple versions
- ✅ Disaster recovery to completely new infrastructure
- ❌ Less component-aware (treats as files)
- ❌ Slower for granular component restore
### Storage Efficiency
Using this two-tier approach:
- **Local**: Database backups (~7 days retention, relatively small)
- **Kopia**: Database backups + library (efficient deduplication)
**Why library goes directly to Kopia without tar:**
Example with 500GB library, adding 10GB photos/month:
**With tar approach:**
- Month 1: Backup 500GB tar
- Month 2: Add 10GB photos → Entire 510GB tar changes → Backup 510GB
- Month 3: Add 10GB photos → Entire 520GB tar changes → Backup 520GB
- **Total storage needed**: 500 + 510 + 520 = 1,530GB
**Without tar (Kopia direct):**
- Month 1: Backup 500GB
- Month 2: Add 10GB photos → Kopia only backs up the 10GB new files
- Month 3: Add 10GB photos → Kopia only backs up the 10GB new files
- **Total storage needed**: 500 + 10 + 10 = 520GB
**Savings**: ~66% reduction in storage and backup time!
This is why we:
- Keep database dumps local (small, fast component restore)
- Let Kopia handle library directly (efficient, incremental, deduplicated)
### Compression and Deduplication
**Database backups** use `gzip` compression:
- Typically 80-90% compression ratio for SQL dumps
- Small enough to keep local copies
**Library backups** use Kopia's built-in compression and deduplication:
- Photos (JPEG/HEIC): Already compressed, Kopia skips re-compression
- Videos: Already compressed, minimal additional compression
- RAW files: Some compression possible
- **Deduplication**: If you upload the same photo twice, Kopia stores it once
- **Block-level dedup**: Even modified photos share unchanged blocks
This is far more efficient than tar + gzip, which would:
- Compress already-compressed photos (wasted CPU, minimal benefit)
- Store entire archive even if only 1 file changed
- Prevent deduplication across backups
## Additional Resources
- [Immich Official Backup Documentation](https://immich.app/docs/administration/backup-and-restore)
- [Kopia Documentation](https://kopia.io/docs/)
- [Docker Volume Backup Best Practices](https://docs.docker.com/storage/volumes/#back-up-restore-or-migrate-data-volumes)
- [PostgreSQL pg_dump Documentation](https://www.postgresql.org/docs/current/app-pgdump.html)
## Revision History
| Date | Version | Changes |
|------|---------|---------|
| 2026-02-13 | 1.0 | Initial documentation - two-tier backup strategy using Immich's native backup method |
---
**Last Updated**: February 13, 2026
**Maintained By**: System Administrator
**Review Schedule**: Quarterly

View file

@ -0,0 +1,879 @@
---
title: Mailcow Backup and Restore Strategy
description: Mailcow backup
published: true
date: 2026-02-20T04:15:25.924Z
tags:
editor: markdown
dateCreated: 2026-02-11T01:20:59.127Z
---
# Mailcow Backup and Recovery Guide
## Overview
This document provides comprehensive backup and recovery procedures for Mailcow email server. Since Mailcow is **not running on ZFS or BTRFS**, snapshots are not available and we rely on Mailcow's native backup script combined with Kopia for offsite storage in vaults.
## Quick Reference
### Common Backup Commands
```bash
# Run a manual backup (all components)
cd /opt/mailcow-dockerized
MAILCOW_BACKUP_LOCATION=/opt/mailcow-backups \
./helper-scripts/backup_and_restore.sh backup all --delete-days 7
# Backup with multithreading (faster)
THREADS=4 MAILCOW_BACKUP_LOCATION=/opt/mailcow-backups \
./helper-scripts/backup_and_restore.sh backup all --delete-days 7
# List Kopia snapshots
kopia snapshot list --tags mailcow
# View backup logs
tail -f /var/log/mailcow-backup.log
```
### Common Restore Commands
```bash
# Restore using mailcow native script (interactive)
cd /opt/mailcow-dockerized
./helper-scripts/backup_and_restore.sh restore
# Restore from Kopia to new server
kopia snapshot list --tags tier1-backup
kopia restore <snapshot-id> /opt/mailcow-backups/
# Check container status after restore
docker compose ps
docker compose logs -f
```
## Critical Components to Backup
### 1. Docker Compose File
- **Location**: `/opt/mailcow-dockerized/docker-compose.yml` (or your installation path)
- **Purpose**: Defines all containers, networks, and volumes
- **Importance**: Critical for recreating the exact container configuration
### 2. Configuration Files
- **Primary Config**: `/opt/mailcow-dockerized/mailcow.conf`
- **Additional Configs**:
- `/opt/mailcow-dockerized/data/conf/` (all subdirectories)
- Custom SSL certificates if not using Let's Encrypt
- Any override files (e.g., `docker-compose.override.yml`)
### 3. Database
- **MySQL/MariaDB Data**: Contains all mailbox configurations, users, domains, aliases, settings
- **Docker Volume**: `mailcowdockerized_mysql-vol`
- **Container Path**: `/var/lib/mysql`
### 4. Email Data
- **Maildir Storage**: All actual email messages
- **Docker Volume**: `mailcowdockerized_vmail-vol`
- **Container Path**: `/var/vmail`
- **Size**: Typically the largest component
### 5. Additional Important Data
- **Redis Data**: `mailcowdockerized_redis-vol` (cache and sessions)
- **Rspamd Data**: `mailcowdockerized_rspamd-vol` (spam learning)
- **Crypt Data**: `mailcowdockerized_crypt-vol` (if using mailbox encryption)
- **Postfix Queue**: `mailcowdockerized_postfix-vol` (queued/deferred mail)
## Backup Strategy
### Two-Tier Backup Approach
We use a **two-tier approach** combining Mailcow's native backup script with Kopia for offsite storage:
1. **Tier 1 (Local)**: Mailcow's `backup_and_restore.sh` script creates consistent, component-level backups
2. **Tier 2 (Offsite)**: Kopia snapshots the local backups and syncs to vaults
#### Why This Approach?
- **Best of both worlds**: Native script ensures mailcow-specific consistency, Kopia provides deduplication and offsite protection
- **Component-level restore**: Can restore individual components (just vmail, just mysql, etc.) using mailcow script
- **Disaster recovery**: Full system restore from Kopia backups on new server
- **Efficient storage**: Kopia's deduplication reduces storage needs for offsite copies
#### Backup Frequency
- **Daily**: Mailcow native backup runs at 2 AM
- **Daily**: Kopia snapshot of backups runs at 3 AM
- **Retention (Local)**: 7 days of mailcow backups (managed by script)
- **Retention (Kopia/Offsite)**: 30 daily, 12 weekly, 12 monthly
### Mailcow Native Backup Script
Mailcow includes `/opt/mailcow-dockerized/helper-scripts/backup_and_restore.sh` which handles:
- **vmail**: Email data (mailboxes)
- **mysql**: Database (using mariabackup for consistency)
- **redis**: Redis database
- **rspamd**: Spam filter learning data
- **crypt**: Encryption data
- **postfix**: Mail queue
**Key Features:**
- Uses `mariabackup` (hot backup without stopping MySQL)
- Supports multithreading for faster backups
- Architecture-aware (handles x86/ARM differences)
- Built-in cleanup with `--delete-days` parameter
- Creates compressed archives (.tar.zst or .tar.gz)
### Setting Up Mailcow Backups
#### Prereq:
Make sure you are connected to the repository,
```bash
sudo kopia repository connect server --url=https://192.168.5.10:51516 --override-username=admin --server-cert-fingerprint=696a4999f594b5273a174fd7cab677d8dd1628f9b9d27e557daa87103ee064b2
```
#### Step 1: Configure Backup Location
Set the backup destination via environment variable or in mailcow.conf:
```bash
# Option 1: Set environment variable (preferred for automation)
export MAILCOW_BACKUP_LOCATION="/opt/mailcow-backups"
# Option 2: Add to cron job directly (shown in automated script below)
```
Create the backup directory:
```bash
mkdir -p /opt/mailcow-backups
chown -R root:root /opt/mailcow-backups
chmod 777 /opt/mailcow-backups
```
#### Step 2: Manual Backup Commands
```bash
cd /opt/mailcow-dockerized
# Backup all components, delete backups older than 7 days
MAILCOW_BACKUP_LOCATION=/opt/mailcow-backups \
./helper-scripts/backup_and_restore.sh backup all --delete-days 7
# Backup with multithreading (faster for large mailboxes)
THREADS=4 MAILCOW_BACKUP_LOCATION=/opt/mailcow-backups \
./helper-scripts/backup_and_restore.sh backup all --delete-days 7
# Backup specific components only
MAILCOW_BACKUP_LOCATION=/opt/mailcow-backups \
./helper-scripts/backup_and_restore.sh backup vmail mysql --delete-days 7
```
**What gets created:**
- Backup directory: `/opt/mailcow-backups/mailcow-YYYY-MM-DD-HH-MM-SS/`
- Contains: `.tar.zst` compressed archives for each component
- Plus: `mailcow.conf` copy for restore reference
#### Step 3: Automated Backup Script
Create `/opt/scripts/backup-mailcow.sh`:
```bash
#!/bin/bash
# Mailcow Automated Backup Script
# This creates mailcow native backups, then snapshots them with Kopia for offsite storage
set -e
BACKUP_DATE=$(date +%Y%m%d_%H%M%S)
LOG_FILE="/var/log/mailcow-backup.log"
MAILCOW_DIR="/opt/mailcow-dockerized"
BACKUP_DIR="/opt/mailcow-backups"
THREADS=4 # Adjust based on your CPU cores
KEEP_DAYS=7 # Keep local mailcow backups for 7 days
echo "[${BACKUP_DATE}] ========================================" | tee -a "$LOG_FILE"
echo "[${BACKUP_DATE}] Starting Mailcow backup process" | tee -a "$LOG_FILE"
# Step 1: Run mailcow's native backup script
echo "[${BACKUP_DATE}] Running mailcow native backup..." | tee -a "$LOG_FILE"
cd "$MAILCOW_DIR"
# Run the backup with multithreading
THREADS=${THREADS} MAILCOW_BACKUP_LOCATION=${BACKUP_DIR} \
./helper-scripts/backup_and_restore.sh backup all --delete-days ${KEEP_DAYS} \
2>&1 | tee -a "$LOG_FILE"
BACKUP_EXIT=${PIPESTATUS[0]}
if [ $BACKUP_EXIT -ne 0 ]; then
echo "[${BACKUP_DATE}] ERROR: Mailcow backup failed with exit code ${BACKUP_EXIT}" | tee -a "$LOG_FILE"
exit 1
fi
echo "[${BACKUP_DATE}] Mailcow native backup completed successfully" | tee -a "$LOG_FILE"
# Step 2: Create Kopia snapshot of backup directory
echo "[${BACKUP_DATE}] Creating Kopia snapshot..." | tee -a "$LOG_FILE"
kopia snapshot create "${BACKUP_DIR}" \
--tags mailcow:tier1-backup \
--description "Mailcow backup ${BACKUP_DATE}" \
2>&1 | tee -a "$LOG_FILE"
KOPIA_EXIT=${PIPESTATUS[0]}
if [ $KOPIA_EXIT -ne 0 ]; then
echo "[${BACKUP_DATE}] WARNING: Kopia snapshot failed with exit code ${KOPIA_EXIT}" | tee -a "$LOG_FILE"
echo "[${BACKUP_DATE}] Local mailcow backup exists but offsite copy may be incomplete" | tee -a "$LOG_FILE"
exit 2
fi
echo "[${BACKUP_DATE}] Kopia snapshot completed successfully" | tee -a "$LOG_FILE"
# Step 3: Also backup the mailcow installation directory (configs, compose files)
echo "[${BACKUP_DATE}] Backing up mailcow installation directory..." | tee -a "$LOG_FILE"
kopia snapshot create "${MAILCOW_DIR}" \
--tags mailcow,config,docker-compose \
--description "Mailcow config ${BACKUP_DATE}" \
2>&1 | tee -a "$LOG_FILE"
echo "[${BACKUP_DATE}] Backup process completed successfully" | tee -a "$LOG_FILE"
echo "[${BACKUP_DATE}] ========================================" | tee -a "$LOG_FILE"
# Optional: Send notification on completion
# Add your notification method here (email, webhook, etc.)
```
Make it executable:
```bash
chmod +x /opt/scripts/backup-mailcow.sh
```
Add to crontab (daily at 2 AM):
```bash
# Edit root's crontab
crontab -e
# Add this line:
0 2 * * * /opt/scripts/backup-mailcow.sh 2>&1 | logger -t mailcow-backup
```
### Offsite Backup to Vaults
After local Kopia snapshots are created, sync to your offsite vaults:
```bash
# Option 1: Kopia repository sync (if using multiple Kopia repos)
kopia repository sync-to filesystem --path /mnt/vault/mailcow-backup
# Option 2: Rsync to vault
rsync -avz --delete /backup/kopia-repo/ /mnt/vault/mailcow-backup/
# Option 3: Rclone to remote vault
rclone sync /backup/kopia-repo/ vault:mailcow-backup/
```
## Recovery Procedures
### Understanding Two Recovery Methods
We have **two restore methods** depending on the scenario:
1. **Mailcow Native Restore** (Preferred): For component-level or same-server recovery
2. **Kopia Full Restore**: For complete disaster recovery to a new server
### Method 1: Mailcow Native Restore (Recommended)
Use this method when:
- Restoring on the same/similar server
- Restoring specific components (just email, just database, etc.)
- Recovering from local mailcow backups
#### Step 1: List Available Backups
```bash
cd /opt/mailcow-dockerized
# Run the restore script
./helper-scripts/backup_and_restore.sh restore
```
The script will prompt:
```
Backup location (absolute path, starting with /): /opt/mailcow-backups
```
#### Step 2: Select Backup
The script displays available backups:
```
Found project name mailcowdockerized
[ 1 ] - /opt/mailcow-backups/mailcow-2026-02-09-02-00-14/
[ 2 ] - /opt/mailcow-backups/mailcow-2026-02-10-02-00-08/
```
Enter the number of the backup to restore.
#### Step 3: Select Components
Choose what to restore:
```
[ 0 ] - all
[ 1 ] - Crypt data
[ 2 ] - Rspamd data
[ 3 ] - Mail directory (/var/vmail)
[ 4 ] - Redis DB
[ 5 ] - Postfix data
[ 6 ] - SQL DB
```
**Important**: The script will:
- Stop mailcow containers automatically
- Restore selected components
- Handle permissions correctly
- Restart containers when done
#### Example: Restore Only Email Data
```bash
cd /opt/mailcow-dockerized
./helper-scripts/backup_and_restore.sh restore
# When prompted:
# - Backup location: /opt/mailcow-backups
# - Select backup: 2 (most recent)
# - Select component: 3 (Mail directory)
```
#### Example: Restore Database Only
```bash
cd /opt/mailcow-dockerized
./helper-scripts/backup_and_restore.sh restore
# When prompted:
# - Backup location: /opt/mailcow-backups
# - Select backup: 2 (most recent)
# - Select component: 6 (SQL DB)
```
**Note**: For database restore, the script will modify `mailcow.conf` with the database credentials from the backup. Review the changes after restore.
### Method 2: Complete Server Rebuild (Kopia Restore)
Use this when recovering to a completely new server or when local backups are unavailable.
#### Step 1: Prepare New Server
```bash
# Update system
apt update && apt upgrade -y
# Install Docker
curl -fsSL https://get.docker.com | sh
systemctl enable docker
systemctl start docker
# Install Docker Compose
apt install docker-compose-plugin -y
# Install Kopia
curl -s https://kopia.io/signing-key | apt-key add -
echo "deb https://packages.kopia.io/apt/ stable main" | tee /etc/apt/sources.list.d/kopia.list
apt update
apt install kopia -y
# Create directory structure
mkdir -p /opt/mailcow-dockerized
mkdir -p /opt/mailcow-backups/database
```
#### Step 2: Restore Kopia Repository
```bash
# Connect to your offsite vault
# If vault is mounted:
kopia repository connect filesystem --path /mnt/vault/mailcow-backup
# If vault is remote:
kopia repository connect s3 --bucket=your-bucket --access-key=xxx --secret-access-key=xxx
# List available snapshots
kopia snapshot list --tags mailcow
```
#### Step 3: Restore Configuration
```bash
# Find and restore the config snapshot
kopia snapshot list --tags config
# Restore to the Mailcow directory
kopia restore <snapshot-id> /opt/mailcow-dockerized/
# Verify critical files
ls -la /opt/mailcow-dockerized/mailcow.conf
ls -la /opt/mailcow-dockerized/docker-compose.yml
```
#### Step 4: Restore Mailcow Backups Directory
```bash
# Restore the entire backup directory from Kopia
kopia snapshot list --tags tier1-backup
# Restore the most recent backup
kopia restore <snapshot-id> /opt/mailcow-backups/
# Verify backups were restored
ls -la /opt/mailcow-backups/
```
#### Step 5: Run Mailcow Native Restore
Now use mailcow's built-in restore script:
```bash
cd /opt/mailcow-dockerized
# Run the restore script
./helper-scripts/backup_and_restore.sh restore
# When prompted:
# - Backup location: /opt/mailcow-backups
# - Select the most recent backup
# - Select [ 0 ] - all (to restore everything)
```
The script will:
1. Stop all mailcow containers
2. Restore all components (vmail, mysql, redis, rspamd, postfix, crypt)
3. Update mailcow.conf with restored database credentials
4. Restart all containers
**Alternative: Manual Restore** (if you prefer more control)
```bash
cd /opt/mailcow-dockerized
# Start containers to create volumes
docker compose up -d --no-start
docker compose down
# Find the most recent backup directory
LATEST_BACKUP=$(ls -td /opt/mailcow-backups/mailcow-* | head -1)
echo "Restoring from: $LATEST_BACKUP"
# Extract each component manually
cd "$LATEST_BACKUP"
# Restore vmail (email data)
docker run --rm \
-v mailcowdockerized_vmail-vol:/backup \
-v "$PWD":/restore \
debian:bookworm-slim \
tar --use-compress-program='zstd -d' -xvf /restore/backup_vmail.tar.zst
# Restore MySQL
docker run --rm \
-v mailcowdockerized_mysql-vol:/backup \
-v "$PWD":/restore \
mariadb:10.11 \
tar --use-compress-program='zstd -d' -xvf /restore/backup_mysql.tar.zst
# Restore Redis
docker run --rm \
-v mailcowdockerized_redis-vol:/backup \
-v "$PWD":/restore \
debian:bookworm-slim \
tar --use-compress-program='zstd -d' -xvf /restore/backup_redis.tar.zst
# Restore other components similarly (rspamd, postfix, crypt)
# ...
# Copy mailcow.conf from backup
cp "$LATEST_BACKUP/mailcow.conf" /opt/mailcow-dockerized/mailcow.conf
```
#### Step 6: Start and Verify Mailcow
```bash
cd /opt/mailcow-dockerized
# Pull latest images (or use versions from backup if preferred)
docker compose pull
# Start all services
docker compose up -d
# Monitor logs
docker compose logs -f
```
#### Step 7: Post-Restore Verification
```bash
# Check container status
docker compose ps
# Test web interface
curl -I https://mail.yourdomain.com
# Check mail log
docker compose logs -f postfix-mailcow
# Verify database
docker compose exec mysql-mailcow mysql -u root -p$(grep DBROOT mailcow.conf | cut -d'=' -f2) -e "SHOW DATABASES;"
# Check email storage
docker compose exec dovecot-mailcow ls -lah /var/vmail/
```
### Scenario 2: Restore Individual Mailbox
To restore a single user's mailbox without affecting others:
#### Option A: Using Mailcow Backups (If Available)
```bash
cd /opt/mailcow-dockerized
# Temporarily mount the backup
BACKUP_DIR="/opt/mailcow-backups/mailcow-YYYY-MM-DD-HH-MM-SS"
# Extract just the vmail archive to a temporary location
mkdir -p /tmp/vmail-restore
cd "$BACKUP_DIR"
tar --use-compress-program='zstd -d' -xvf backup_vmail.tar.zst -C /tmp/vmail-restore
# Find the user's mailbox
# Structure: /tmp/vmail-restore/var/vmail/domain.com/user/
ls -la /tmp/vmail-restore/var/vmail/yourdomain.com/
# Copy specific mailbox
rsync -av /tmp/vmail-restore/var/vmail/yourdomain.com/user@domain.com/ \
/var/lib/docker/volumes/mailcowdockerized_vmail-vol/_data/yourdomain.com/user@domain.com/
# Fix permissions
docker run --rm \
-v mailcowdockerized_vmail-vol:/vmail \
debian:bookworm-slim \
chown -R 5000:5000 /vmail/yourdomain.com/user@domain.com/
# Cleanup
rm -rf /tmp/vmail-restore
# Restart Dovecot to recognize changes
docker compose restart dovecot-mailcow
```
#### Option B: Using Kopia Snapshot (If Local Backups Unavailable)
```bash
# Mount the vmail snapshot temporarily
mkdir -p /mnt/restore
kopia mount <vmail-snapshot-id> /mnt/restore
# Find the user's mailbox
# Structure: /mnt/restore/domain.com/user/
ls -la /mnt/restore/yourdomain.com/
# Copy specific mailbox
rsync -av /mnt/restore/yourdomain.com/user@domain.com/ \
/var/lib/docker/volumes/mailcowdockerized_vmail-vol/_data/yourdomain.com/user@domain.com/
# Fix permissions
chown -R 5000:5000 /var/lib/docker/volumes/mailcowdockerized_vmail-vol/_data/yourdomain.com/user@domain.com/
# Unmount
kopia unmount /mnt/restore
# Restart Dovecot to recognize changes
docker compose restart dovecot-mailcow
```
### Scenario 3: Database Recovery Only
If only the database is corrupted but email data is intact:
#### Option A: Using Mailcow Native Restore (Recommended)
```bash
cd /opt/mailcow-dockerized
# Run the restore script
./helper-scripts/backup_and_restore.sh restore
# When prompted:
# - Backup location: /opt/mailcow-backups
# - Select the most recent backup
# - Select [ 6 ] - SQL DB (database only)
```
The script will:
1. Stop mailcow
2. Restore the MySQL database from the mariabackup archive
3. Update mailcow.conf with the restored database credentials
4. Restart mailcow
#### Option B: Manual Database Restore from Kopia
If local backups are unavailable:
```bash
cd /opt/mailcow-dockerized
# Stop Mailcow
docker compose down
# Start only MySQL
docker compose up -d mysql-mailcow
# Wait for MySQL
sleep 30
# Restore from Kopia database dump
kopia snapshot list --tags database
kopia restore <snapshot-id> /tmp/db-restore/
# Import the dump
LATEST_DUMP=$(ls -t /tmp/db-restore/mailcow_*.sql | head -1)
docker compose exec -T mysql-mailcow mysql -u root -p$(grep DBROOT mailcow.conf | cut -d'=' -f2) < "$LATEST_DUMP"
# Start all services
docker compose down
docker compose up -d
# Verify
docker compose logs -f
```
### Scenario 4: Configuration Recovery Only
If you only need to restore configuration files:
#### Option A: From Mailcow Backup
```bash
# Find the most recent backup
LATEST_BACKUP=$(ls -td /opt/mailcow-backups/mailcow-* | head -1)
# Stop Mailcow
cd /opt/mailcow-dockerized
docker compose down
# Backup current config (just in case)
cp mailcow.conf mailcow.conf.pre-restore
cp docker-compose.yml docker-compose.yml.pre-restore
# Restore mailcow.conf from backup
cp "$LATEST_BACKUP/mailcow.conf" ./mailcow.conf
# If you also need other config files from data/conf/,
# you would need to extract them from the backup archives
# Restart
docker compose up -d
```
#### Option B: From Kopia Snapshot
```bash
# Restore config snapshot to temporary location
kopia restore <config-snapshot-id> /tmp/mailcow-restore/
# Stop Mailcow
cd /opt/mailcow-dockerized
docker compose down
# Backup current config (just in case)
cp mailcow.conf mailcow.conf.pre-restore
cp docker-compose.yml docker-compose.yml.pre-restore
# Restore specific files
cp /tmp/mailcow-restore/mailcow.conf ./
cp /tmp/mailcow-restore/docker-compose.yml ./
cp -r /tmp/mailcow-restore/data/conf/* ./data/conf/
# Restart
docker compose up -d
```
## Verification and Testing
### Regular Backup Verification
Perform monthly restore tests to ensure backups are valid:
```bash
# Test restore to temporary location
mkdir -p /tmp/backup-test
kopia snapshot list --tags mailcow
kopia restore <snapshot-id> /tmp/backup-test/
# Verify files exist and are readable
ls -lah /tmp/backup-test/
cat /tmp/backup-test/mailcow.conf
# Cleanup
rm -rf /tmp/backup-test/
```
### Backup Monitoring Script
Create `/opt/scripts/check-mailcow-backup.sh`:
```bash
#!/bin/bash
# Check last backup age
LAST_BACKUP=$(kopia snapshot list --tags mailcow --json | jq -r '.[0].startTime')
LAST_BACKUP_EPOCH=$(date -d "$LAST_BACKUP" +%s)
NOW=$(date +%s)
AGE_HOURS=$(( ($NOW - $LAST_BACKUP_EPOCH) / 3600 ))
if [ $AGE_HOURS -gt 26 ]; then
echo "WARNING: Last Mailcow backup is $AGE_HOURS hours old"
# Send alert (email, Slack, etc.)
exit 1
else
echo "OK: Last backup $AGE_HOURS hours ago"
fi
```
## Disaster Recovery Checklist
When disaster strikes, follow this checklist:
- [ ] Confirm scope of failure (server, storage, specific component)
- [ ] Gather server information (hostname, IP, DNS records)
- [ ] Access offsite backup vault
- [ ] Provision new server (if needed)
- [ ] Install Docker and dependencies
- [ ] Connect to Kopia repository
- [ ] Restore configurations first
- [ ] Restore database
- [ ] Restore email data
- [ ] Start services and verify
- [ ] Test email sending/receiving
- [ ] Verify webmail access
- [ ] Check DNS records and update if needed
- [ ] Document any issues encountered
- [ ] Update recovery procedures based on experience
## Important Notes
1. **DNS**: Keep DNS records documented separately. Recovery includes updating DNS if server IP changes.
2. **SSL Certificates**: Let's Encrypt certificates are in the backup but may need renewal. Mailcow will handle this automatically.
3. **Permissions**: Docker volumes have specific UID/GID requirements:
- vmail: `5000:5000`
- mysql: `999:999`
4. **Testing**: Always test recovery procedures in a lab environment before trusting them in production.
5. **Documentation**: Keep this guide and server details in a separate location (printed copy, password manager, etc.).
6. **Retention Policy**: Review Kopia retention settings periodically to balance storage costs with recovery needs.
## Backup Architecture Notes
### Why Two Backup Layers?
**Mailcow Native Backups** (Tier 1):
- ✅ Component-aware (knows about mailcow's structure)
- ✅ Uses mariabackup for consistent MySQL hot backups
- ✅ Fast, selective restore (can restore just one component)
- ✅ Architecture-aware (handles x86/ARM differences)
- ❌ No deduplication (full copies each time)
- ❌ Limited to local storage initially
**Kopia Snapshots** (Tier 2):
- ✅ Deduplication and compression
- ✅ Efficient offsite replication to vaults
- ✅ Point-in-time recovery across multiple versions
- ✅ Disaster recovery to completely new infrastructure
- ❌ Less component-aware (treats as files)
- ❌ Slower for granular component restore
### Storage Efficiency
Using this two-tier approach:
- **Local**: Mailcow creates ~7 days of native backups (may be large, but short retention)
- **Offsite**: Kopia deduplicates these backups for long-term vault storage (much smaller)
Example storage calculation (10GB mailbox):
- Local: 7 days × 10GB = ~70GB (before compression)
- Kopia (offsite): First backup ~10GB, subsequent backups only store changes (might be <1GB/day after dedup)
### Compression Formats
Mailcow's script creates `.tar.zst` (Zstandard) or `.tar.gz` (gzip) files:
- **Zstandard** (modern): Better compression ratio, faster (recommended)
- **Gzip** (legacy): Wider compatibility with older systems
Verify your backup compression:
```bash
ls -lh /opt/mailcow-backups/mailcow-*/
# Look for .tar.zst (preferred) or .tar.gz
```
### Cross-Architecture Considerations
**Important for ARM/x86 Migration**:
Mailcow's backup script is architecture-aware. When restoring:
- **Rspamd data** cannot be restored across different architectures (x86 ↔ ARM)
- **All other components** (vmail, mysql, redis, postfix, crypt) are architecture-independent
If migrating between architectures:
```bash
# Restore everything EXCEPT rspamd
# Select components individually: vmail, mysql, redis, postfix, crypt
# Skip rspamd - it will rebuild its learning database over time
```
### Testing Your Backups
**Monthly Test Protocol**:
1. **Verify local backups exist**:
```bash
ls -lh /opt/mailcow-backups/
# Should see recent dated directories
```
2. **Verify Kopia snapshots**:
```bash
kopia snapshot list --tags mailcow
# Should see recent snapshots
```
3. **Test restore in lab** (recommended quarterly):
- Spin up a test VM
- Restore from Kopia
- Run mailcow native restore
- Verify email delivery and webmail access
## Additional Resources
- [Mailcow Official Backup Documentation](https://docs.mailcow.email/backup_restore/b_n_r-backup/)
- [Kopia Documentation](https://kopia.io/docs/)
- [Docker Volume Backup Best Practices](https://docs.docker.com/storage/volumes/#back-up-restore-or-migrate-data-volumes)
## Revision History
| Date | Version | Changes |
|------|---------|---------|
| 2026-02-10 | 1.1 | Integrated mailcow native backup_and_restore.sh script as primary backup method |
| 2026-02-10 | 1.0 | Initial documentation |
---
**Last Updated**: February 10, 2026
**Maintained By**: System Administrator
**Review Schedule**: Quarterly

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,19 @@
---
title: Services Backup
description:
published: true
date: 2026-02-20T04:08:15.923Z
tags:
editor: markdown
dateCreated: 2026-02-05T21:28:23.152Z
---
- [Mailcow](/backup-mailcow)
- [Immich](/immich_backup)
- [Nextcloud](/nextcloud_backup)
- kopia
- forgejo
- bitwarden
- wiki
- journalv

View file

@ -0,0 +1,567 @@
---
title: Wikijs Backup
description: Backup Wikijs
published: true
date: 2026-02-23T04:35:32.870Z
tags:
editor: markdown
dateCreated: 2026-02-23T04:35:24.121Z
---
# Wiki.js Backup & Recovery
**Service:** Wiki.js (Netgrimoire)
**Stack:** Docker Compose — Wiki.js + PostgreSQL
**Backup Targets:** PostgreSQL database dump, Git content repository, Docker Compose config
**Backup Destinations:** Local vault path → Kopia → offsite vaults
---
## Overview
Wiki.js data lives in two separate places that must be backed up independently:
**PostgreSQL database** — stores page metadata, navigation, user accounts, permissions, page history, assets, and all configuration. This is the critical component for a portable restore. Without it, a new instance has no knowledge of your wiki structure.
**Git content repository** — stores the actual page content in markdown files, synced from Forgejo. This is already mirrored on the VAULT SSD at `/vault/repos/wiki/`. It is inherently redundant as long as Forgejo is healthy, but is included in backups for completeness and offline portability.
**Docker Compose config** — the `docker-compose.yml` and `.env` files needed to recreate the stack.
---
## What Gets Backed Up
| Component | Location | Method | Critical? |
|---|---|---|---|
| PostgreSQL database | Docker volume | `pg_dump` → SQL file | Yes — primary restore target |
| Git content repo | `/vault/repos/wiki/` | Already on VAULT SSD | Yes — page content |
| Docker Compose files | `/opt/stacks/wikijs/` | rsync copy | Yes — stack config |
| Wiki.js data volume | Docker volume | Optional rsync | No — DB + Git covers this |
---
## Backup Strategy
### Tier 1 — Daily Dump to Vault Path
A script runs daily via systemd timer. It produces a portable `pg_dump` SQL file written to `/vault/backups/wiki/`. These local dumps are retained for 14 days.
**Key choices:**
- `--format=plain` — plain SQL, portable to any PostgreSQL version and any host
- `--no-owner` — strips role ownership, so the dump restores cleanly on a new instance with a different postgres user (critical for Pocket Grimoire restores)
- `--no-acl` — strips GRANT/REVOKE statements for the same reason
- No application downtime required — PostgreSQL handles consistent dumps natively
### Tier 2 — Kopia Snapshot to Offsite Vaults
After the daily dump completes, Kopia snapshots the entire `/vault/backups/wiki/` directory and replicates to your offsite vaults. Kopia deduplication means only changed blocks are transferred after the first run.
---
## Setup
### Step 0 — Confirm Kopia Repository Exists
If Kopia is not yet initialized on this host, initialize it first. If you already initialized Kopia for Mailcow or another service, skip this step — all services share the same Kopia repository.
```bash
# Check if repository already exists
kopia repository status
# If not initialized, create it against your vault path
kopia repository create filesystem --path=/vault/kopia
# Connect on subsequent logins if disconnected
kopia repository connect filesystem --path=/vault/kopia
```
### Step 1 — Create Backup Directories
```bash
sudo mkdir -p /vault/backups/wiki
sudo chown $(whoami):$(whoami) /vault/backups/wiki
```
### Step 2 — Create the Backup Script
```bash
sudo nano /usr/local/sbin/wikijs-backup.sh
```
```bash
#!/usr/bin/env bash
# wikijs-backup.sh — Daily Wiki.js backup: pg_dump + git repo + config
# Writes to /vault/backups/wiki/, then snapshots with Kopia
set -euo pipefail
# ── Configuration ─────────────────────────────────────────────────────────────
BACKUP_DIR="/vault/backups/wiki"
DATE=$(date +%Y%m%d_%H%M%S)
CONTAINER_DB="wikijs_db" # Adjust to your actual container name
PG_USER="wikijs"
PG_DB="wikijs"
WIKI_STACK_DIR="/opt/stacks/wikijs" # Location of docker-compose.yml and .env
GIT_REPO_DIR="/vault/repos/wiki" # Git content mirror (already on vault SSD)
RETAIN_DAYS=14 # Local dump retention
LOG="/var/log/wikijs-backup.log"
touch "$LOG"
log() { echo "$(date -Is) $*" | tee -a "$LOG"; }
# ── Step 1: PostgreSQL dump ────────────────────────────────────────────────────
log "Starting Wiki.js PostgreSQL dump..."
docker exec "$CONTAINER_DB" pg_dump \
-U "$PG_USER" \
"$PG_DB" \
--format=plain \
--no-owner \
--no-acl \
> "${BACKUP_DIR}/wikijs-db-${DATE}.sql"
gzip "${BACKUP_DIR}/wikijs-db-${DATE}.sql"
log "PostgreSQL dump complete: wikijs-db-${DATE}.sql.gz"
# ── Step 2: Docker Compose config backup ──────────────────────────────────────
log "Backing up Docker Compose config..."
CONFIG_BACKUP="${BACKUP_DIR}/wikijs-config-${DATE}.tar.gz"
tar -czf "$CONFIG_BACKUP" \
-C "$(dirname "$WIKI_STACK_DIR")" \
"$(basename "$WIKI_STACK_DIR")"
log "Config backup complete: wikijs-config-${DATE}.tar.gz"
# ── Step 3: Git repo snapshot (content mirror) ────────────────────────────────
# The git repo lives on the VAULT SSD and is already versioned.
# We record the current HEAD commit for reference.
if [ -d "${GIT_REPO_DIR}/.git" ]; then
GIT_HEAD=$(git -C "$GIT_REPO_DIR" rev-parse HEAD 2>/dev/null || echo "unknown")
echo "Git HEAD at backup time: ${GIT_HEAD}" \
> "${BACKUP_DIR}/wikijs-git-ref-${DATE}.txt"
log "Git content repo HEAD: ${GIT_HEAD}"
else
log "WARNING: Git repo not found at ${GIT_REPO_DIR} — skipping git ref"
fi
# ── Step 4: Cleanup old local dumps ───────────────────────────────────────────
log "Cleaning up dumps older than ${RETAIN_DAYS} days..."
find "$BACKUP_DIR" -name "wikijs-db-*.sql.gz" -mtime +"$RETAIN_DAYS" -delete
find "$BACKUP_DIR" -name "wikijs-config-*.tar.gz" -mtime +"$RETAIN_DAYS" -delete
find "$BACKUP_DIR" -name "wikijs-git-ref-*.txt" -mtime +"$RETAIN_DAYS" -delete
# ── Step 5: Kopia snapshot ────────────────────────────────────────────────────
log "Running Kopia snapshot of /vault/backups/wiki/..."
kopia snapshot create "$BACKUP_DIR" \
--tags "service:wikijs,host:$(hostname -s)"
log "Kopia snapshot complete."
# ── Done ──────────────────────────────────────────────────────────────────────
log "Wiki.js backup finished successfully."
```
```bash
sudo chmod +x /usr/local/sbin/wikijs-backup.sh
```
### Step 3 — Create systemd Service and Timer
```bash
sudo nano /etc/systemd/system/wikijs-backup.service
```
```ini
[Unit]
Description=Wiki.js daily backup (pg_dump + config + Kopia snapshot)
After=docker.service
[Service]
Type=oneshot
ExecStart=/usr/local/sbin/wikijs-backup.sh
```
```bash
sudo nano /etc/systemd/system/wikijs-backup.timer
```
```ini
[Unit]
Description=Run Wiki.js backup daily at 02:00
[Timer]
OnCalendar=*-*-* 02:00:00
Persistent=true
[Install]
WantedBy=timers.target
```
```bash
sudo systemctl daemon-reload
sudo systemctl enable wikijs-backup.timer
sudo systemctl start wikijs-backup.timer
# Verify
systemctl list-timers | grep wikijs
```
### Step 4 — Configure Kopia Retention Policy
```bash
# Set retention policy for wiki backups
kopia policy set /vault/backups/wiki \
--keep-daily 14 \
--keep-weekly 8 \
--keep-monthly 12 \
--compression zstd
# Verify policy
kopia policy show /vault/backups/wiki
```
### Step 5 — Test the Backup
```bash
# Run manually first time
sudo /usr/local/sbin/wikijs-backup.sh
# Verify output
ls -lh /vault/backups/wiki/
# Should show: wikijs-db-YYYYMMDD_HHMMSS.sql.gz
# wikijs-config-YYYYMMDD_HHMMSS.tar.gz
# wikijs-git-ref-YYYYMMDD_HHMMSS.txt
# Verify Kopia snapshot was created
kopia snapshot list /vault/backups/wiki
# Check backup log
tail -n 30 /var/log/wikijs-backup.log
```
---
## Verifying Backups
### Check dump is readable
```bash
# Inspect the SQL dump without extracting
zcat /vault/backups/wiki/wikijs-db-YYYYMMDD_HHMMSS.sql.gz | head -50
# Should show PostgreSQL header, version info, and CREATE TABLE statements
```
### Verify Kopia snapshots
```bash
# List recent snapshots
kopia snapshot list /vault/backups/wiki
# Show snapshot details
kopia snapshot list /vault/backups/wiki --all
# Verify snapshot integrity
kopia snapshot verify
```
### Test restore to a temporary database (non-destructive)
```bash
# Start a temporary Postgres container
docker run --rm -d \
--name wikijs-restore-test \
-e POSTGRES_USER=wikijs \
-e POSTGRES_PASSWORD=testpassword \
-e POSTGRES_DB=wikijs_test \
postgres:16-alpine
# Wait for Postgres to be ready
sleep 5
# Restore dump into test container
zcat /vault/backups/wiki/wikijs-db-YYYYMMDD_HHMMSS.sql.gz | \
docker exec -i wikijs-restore-test psql -U wikijs -d wikijs_test
# Verify tables exist
docker exec wikijs-restore-test psql -U wikijs -d wikijs_test -c "\dt"
# Expected output: List of tables (pages, users, pageHistory, assets, etc.)
# Cleanup test container
docker stop wikijs-restore-test
```
---
## Recovery Procedures
### Scenario A — Restore to a New Wiki.js Instance (Any Host)
This covers full disaster recovery to a fresh server, including Pocket Grimoire.
**Requirements on the destination host:**
- Docker and Docker Compose installed
- A `docker-compose.yml` and `.env` ready (from backup or Pocket Grimoire stack)
- Sufficient disk space
**Step 1: Locate the backup**
```bash
# On Netgrimoire, find the dump to restore
ls -lh /vault/backups/wiki/
# Or restore from Kopia
kopia snapshot list /vault/backups/wiki
kopia restore SNAPSHOT_ID /tmp/wiki-restore/
ls /tmp/wiki-restore/
```
**Step 2: Copy dump to the destination host**
```bash
# From Netgrimoire, copy to the destination server
scp /vault/backups/wiki/wikijs-db-YYYYMMDD_HHMMSS.sql.gz \
user@destination-host:/tmp/
# Or to Pocket Grimoire
scp /vault/backups/wiki/wikijs-db-YYYYMMDD_HHMMSS.sql.gz \
user@pocket-grimoire.local:/tmp/
```
**Step 3: Start the database container only**
On the destination host, start just the database — do not start Wiki.js yet:
```bash
cd /srv/pocket-grimoire/stacks/wikijs # Adjust path as needed
# Start only the database container
docker compose up -d db
# Wait for healthy status
docker compose ps
# db should show: healthy
```
**Step 4: Restore the dump**
```bash
# Restore the dump into the running database container
zcat /tmp/wikijs-db-YYYYMMDD_HHMMSS.sql.gz | \
docker exec -i pocketgrimoire_db psql \
-U wikijs \
-d wikijs
# Verify tables restored
docker exec pocketgrimoire_db psql -U wikijs -d wikijs -c "\dt"
```
**Step 5: Start Wiki.js**
```bash
docker compose up -d
# Watch startup logs
docker logs -f pocketgrimoire_wikijs
# Wait for: "HTTP Server started successfully"
```
**Step 6: Verify**
Open `http://pocket-grimoire.local:3000` and confirm:
- Pages load correctly
- Navigation structure is intact
- User accounts are present (if you had multiple users)
**Step 7: Re-sync Git content (if needed)**
The database knows the page structure, but if the Git content repo isn't present on the new host, import it:
```bash
# In Wiki.js admin panel:
# Administration → Storage → Git
# Click "Force Sync" or "Import Content"
# Or copy the repo from VAULT SSD
rsync -avP /vault/repos/wiki/ /srv/pocket-grimoire/repos/wiki/
```
---
### Scenario B — Restore on Existing Netgrimoire Instance
Use this when the Wiki.js database is corrupted but the host is otherwise healthy.
**Step 1: Stop Wiki.js (leave database running)**
```bash
cd /opt/stacks/wikijs
docker compose stop wikijs
```
**Step 2: Drop and recreate the database**
```bash
docker exec -it wikijs_db psql -U postgres -c "DROP DATABASE wikijs;"
docker exec -it wikijs_db psql -U postgres -c "CREATE DATABASE wikijs OWNER wikijs;"
```
**Step 3: Restore**
```bash
zcat /vault/backups/wiki/wikijs-db-YYYYMMDD_HHMMSS.sql.gz | \
docker exec -i wikijs_db psql -U wikijs -d wikijs
```
**Step 4: Restart Wiki.js**
```bash
docker compose start wikijs
docker logs -f wikijs
```
---
### Scenario C — Restore Config Only
If the stack config was lost but the database volume is intact:
```bash
# Extract config from backup
tar -xzf /vault/backups/wiki/wikijs-config-YYYYMMDD_HHMMSS.tar.gz \
-C /opt/stacks/
# Verify
ls /opt/stacks/wikijs/
# Should show: docker-compose.yml .env
# Restart stack
cd /opt/stacks/wikijs
docker compose up -d
```
---
### Restore from Kopia (Offsite)
When local vault files are unavailable, restore the backup directory from Kopia first:
```bash
# List available snapshots
kopia snapshot list /vault/backups/wiki
# Restore snapshot to temp directory
kopia restore SNAPSHOT_ID /tmp/wiki-restore/
# Then proceed with the appropriate scenario above
# using files from /tmp/wiki-restore/ instead of /vault/backups/wiki/
```
---
## Pocket Grimoire Specifics
When restoring to Pocket Grimoire, note the following differences from a full Netgrimoire instance:
**Container names** differ — use `pocketgrimoire_db` instead of `wikijs_db`.
**Stack path** is `/srv/pocket-grimoire/stacks/wikijs/` instead of `/opt/stacks/wikijs/`.
**The database is already initialized** when Pocket Grimoire is first set up. Restoring a Netgrimoire dump overwrites it entirely, which is the intended behavior — Pocket Grimoire becomes a mirror of Netgrimoire's wiki state.
**Git content repo** is located at `/srv/pocket-grimoire/repos/wiki/` and is populated via the sync script (`pocketgrimoire-sync.sh`). A database restore alone is sufficient if the Git repo is already in place.
**Recommended restore workflow for Pocket Grimoire:**
```bash
# 1. Copy dump from VAULT SSD (already available on Pocket Grimoire)
ls /srv/vaultpg/backups/wiki/
# 2. Start db container only
cd /srv/pocket-grimoire/stacks/wikijs && docker compose up -d db
# 3. Restore
zcat /srv/vaultpg/backups/wiki/wikijs-db-LATEST.sql.gz | \
docker exec -i pocketgrimoire_db psql -U wikijs -d wikijs
# 4. Start full stack
docker compose up -d
```
Because the VAULT SSD is always connected to Pocket Grimoire, no file transfer is needed — the dumps are already there.
---
## Monitoring & Alerts
Add the following to your existing ntfy/monitoring setup to alert on backup failures. Wrap the backup script call in an error trap:
```bash
# Add to wikijs-backup.sh after set -euo pipefail:
NTFY_URL="https://ntfy.YOUR_DOMAIN/wikijs-backup"
on_error() {
curl -fsS -X POST "$NTFY_URL" \
-H "Title: Wiki.js backup FAILED ($(hostname -s))" \
-H "Priority: high" \
-H "Tags: rotating_light" \
-d "Backup failed at $(date -Is). Check /var/log/wikijs-backup.log"
}
trap on_error ERR
```
### Check backup age manually
```bash
# Find most recent dump
ls -lt /vault/backups/wiki/wikijs-db-*.sql.gz | head -3
# Check Kopia last snapshot time
kopia snapshot list /vault/backups/wiki | tail -5
```
---
## Quick Reference
```bash
# Run backup manually
sudo /usr/local/sbin/wikijs-backup.sh
# Watch backup log
tail -f /var/log/wikijs-backup.log
# Check timer status
systemctl status wikijs-backup.timer
# List local dumps
ls -lh /vault/backups/wiki/
# List Kopia snapshots
kopia snapshot list /vault/backups/wiki
# Restore dump (generic)
zcat /vault/backups/wiki/wikijs-db-YYYYMMDD_HHMMSS.sql.gz | \
docker exec -i CONTAINER_NAME psql -U wikijs -d wikijs
# Test dump is readable
zcat /vault/backups/wiki/wikijs-db-YYYYMMDD_HHMMSS.sql.gz | head -50
```
---
## Revision History
| Version | Date | Notes |
|---|---|---|
| 1.0 | 2026-02-22 | Initial release — pg_dump + Kopia + Pocket Grimoire restore procedures |

View file

@ -0,0 +1,940 @@
---
title: Setting Up Kopia
description:
published: true
date: 2026-02-20T04:27:59.823Z
tags:
editor: markdown
dateCreated: 2026-01-23T22:14:17.009Z
---
# Kopia Backup System Documentation
## Overview
This system implements a two-tier backup strategy using **two separate Kopia Server instances**:
1. **Primary Repository** (`/srv/vault/kopia_repository`) - Full backups of all clients, served on port 51515
2. **Vault Repository** (`/srv/vault/backup`) - Targeted critical data backups, served on port 51516, replicated offsite via ZFS send/receive
The Vault repository sits on its own ZFS dataset to enable clean replication to offsite Pi systems. Running two separate Kopia servers allows independent management of each repository while maintaining the same HTTPS-based client connection model for both.
---
## Architecture
```
Clients (docker2, cindy's desktop, etc.)
├─→ Primary Backup → Kopia Server Primary (port 51515)
│ → /srv/vault/kopia_repository (all data)
└─→ Vault Backup → Kopia Server Vault (port 51516)
→ /srv/vault/backup (critical data only)
ZFS Send/Receive
┌───────┴───────┐
↓ ↓
Pi Vault 1 Pi Vault 2
(offsite) (offsite)
```
---
## Initial Setup on ZNAS
### Prerequisites
- Docker installed on ZNAS
- ZFS pool available
### 1. Create ZFS Datasets
```bash
# Primary repository dataset (if not already created)
zfs create -o mountpoint=/srv/vault zpool/vault
zfs create zpool/vault/kopia_repository
# Vault repository dataset (for offsite replication)
zfs create zpool/vault/backup
```
### 2. Install Kopia Servers (Docker)
We run **two separate Kopia Server containers** - one for primary backups, one for vault backups.
```bash
# Primary repository server (port 51515)
docker run -d \
--name kopia-server-primary \
--restart unless-stopped \
-p 51515:51515 \
-v /srv/vault/kopia_repository:/app/repository \
-v /srv/vault/config-primary:/app/config \
-v /srv/vault/logs-primary:/app/logs \
kopia/kopia:latest server start \
--address=0.0.0.0:51515 \
--tls-generate-cert
# Vault repository server (port 51516)
docker run -d \
--name kopia-server-vault \
--restart unless-stopped \
-p 51516:51516 \
-v /srv/vault/backup:/app/repository \
-v /srv/vault/config-vault:/app/config \
-v /srv/vault/logs-vault:/app/logs \
kopia/kopia:latest server start \
--address=0.0.0.0:51516 \
--tls-generate-cert
```
**Get the certificate fingerprints:**
```bash
# Primary server fingerprint
docker exec kopia-server-primary kopia server status
# Vault server fingerprint
docker exec kopia-server-vault kopia server status
```
**Note:** Record both certificate fingerprints - you'll need them for client connections.
- **Primary server cert SHA256:** `696a4999f594b5273a174fd7cab677d8dd1628f9b9d27e557daa87103ee064b2`
- **Vault server cert SHA256:** *(get from command above)*
### 3. Create Kopia Repositories
Each server manages its own repository. These are created during first server start, but you can initialize them manually if needed.
```bash
# Primary repository (usually created via GUI on first use)
docker exec -it kopia-server-primary kopia repository create filesystem \
--path=/app/repository \
--description="Primary backup repository"
# Vault repository
docker exec -it kopia-server-vault kopia repository create filesystem \
--path=/app/repository \
--description="Vault backup repository for offsite replication"
```
**Note:** If you created the primary repository via the Kopia UI, you don't need to run the first command.
### 4. Create User Accounts
Create users on each server separately.
**Primary repository users:**
```bash
# Enter primary server container
docker exec -it kopia-server-primary /bin/sh
# Create users
kopia server users add admin@docker2
kopia server users add cindy@DESKTOP-QLSVD8P
# Password for cindy: LucyDog123
# Exit container
exit
```
**Vault repository users:**
```bash
# Enter vault server container
docker exec -it kopia-server-vault /bin/sh
# Create users
kopia server users add admin@docker2-vault
kopia server users add cindy@DESKTOP-QLSVD8P-vault
# Use same passwords or different based on security requirements
# Exit container
exit
```
---
## Client Configuration
### Linux Client (docker2)
#### Primary Backup Setup
1. **Install Kopia**
```bash
# Download and install kopia .deb package
wget https://github.com/kopia/kopia/releases/download/v0.XX.X/kopia_0.XX.X_amd64.deb
sudo dpkg -i kopia_0.XX.X_amd64.deb
```
2. **Remove old repository (if exists)**
```bash
sudo kopia repository disconnect || true
sudo rm -rf /root/.config/kopia
```
3. **Connect to primary repository**
```bash
sudo kopia repository connect server \
--url=https://192.168.5.10:51515 \
--override-username=admin@docker2 \
--server-cert-fingerprint=696a4999f594b5273a174fd7cab677d8dd1628f9b9d27e557daa87103ee064b2
```
4. **Create initial snapshot**
```bash
sudo kopia snapshot create /DockerVol/
```
5. **Set up cron job for primary backups**
```bash
sudo crontab -e
# Add this line (runs every 3 hours)
*/180 * * * * /usr/bin/kopia snapshot create /DockerVol >> /var/log/kopia-primary-cron.log 2>&1
```
#### Vault Backup Setup (Critical Data)
1. **Create secondary kopia config directory**
```bash
sudo mkdir -p /root/.config/kopia-vault
```
2. **Connect to vault repository**
```bash
sudo kopia --config-file=/root/.config/kopia-vault/repository.config \
repository connect server \
--url=https://192.168.5.10:51516 \
--override-username=admin@docker2-vault \
--server-cert-fingerprint=<VAULT_SERVER_CERT_FINGERPRINT>
```
**Note:** Replace `<VAULT_SERVER_CERT_FINGERPRINT>` with the actual fingerprint from the vault server (see setup section).
3. **Create vault backup script**
```bash
sudo nano /usr/local/bin/kopia-vault-backup.sh
```
Add this content:
```bash
#!/bin/bash
# Kopia Vault Backup Script
# Backs up critical data to vault repository for offsite replication
KOPIA_CONFIG="/root/.config/kopia-vault/repository.config"
LOG_FILE="/var/log/kopia-vault-cron.log"
# Add your critical directories here
VAULT_DIRS=(
"/DockerVol/critical-app1"
"/DockerVol/critical-app2"
"/home/admin/documents"
)
echo "=== Vault backup started at $(date) ===" >> "$LOG_FILE"
for dir in "${VAULT_DIRS[@]}"; do
if [ -d "$dir" ]; then
echo "Backing up: $dir" >> "$LOG_FILE"
/usr/bin/kopia --config-file="$KOPIA_CONFIG" snapshot create "$dir" >> "$LOG_FILE" 2>&1
else
echo "Directory not found: $dir" >> "$LOG_FILE"
fi
done
echo "=== Vault backup completed at $(date) ===" >> "$LOG_FILE"
echo "" >> "$LOG_FILE"
```
4. **Make script executable**
```bash
sudo chmod +x /usr/local/bin/kopia-vault-backup.sh
```
5. **Set up cron job for vault backups**
```bash
sudo crontab -e
# Add this line (runs daily at 3 AM)
0 3 * * * /usr/local/bin/kopia-vault-backup.sh
```
---
### Windows Client (Cindy's Desktop)
#### Primary Backup Setup
1. **Install Kopia**
```powershell
# Using winget
winget install kopia
```
2. **Connect to primary repository**
```powershell
kopia repository connect server `
--url=https://192.168.5.10:51515 `
--override-username=cindy@DESKTOP-QLSVD8P `
--server-cert-fingerprint=696a4999f594b5273a174fd7cab677d8dd1628f9b9d27e557daa87103ee064b2
```
3. **Create initial snapshot**
```powershell
kopia snapshot create C:\Users\cindy
```
4. **Set exclusion policy**
```powershell
kopia policy set `
--global `
--add-ignore "**\AppData\Local\Temp\**" `
--add-ignore "**\AppData\Local\Packages\**"
```
5. **Create primary backup script**
```powershell
# Create scripts folder
New-Item -ItemType Directory -Force -Path C:\Scripts
# Create backup script
New-Item -ItemType File -Path C:\Scripts\kopia-primary-nightly.ps1
```
Add this content to `C:\Scripts\kopia-primary-nightly.ps1`:
```powershell
# Kopia Primary Backup Script
# Repository password
$env:KOPIA_PASSWORD = "LucyDog123"
# Run backup with logging
kopia snapshot create C:\Users\cindy `
--progress `
| Tee-Object -FilePath C:\Logs\kopia-primary.log -Append
# Log completion
Add-Content -Path C:\Logs\kopia-primary.log -Value "Backup completed at $(Get-Date)"
Add-Content -Path C:\Logs\kopia-primary.log -Value "---"
```
6. **Secure the script**
- Right-click `C:\Scripts\kopia-primary-nightly.ps1` → Properties → Security
- Ensure only Cindy's user account has read access
7. **Create scheduled task for primary backup**
- Press `Win + R` → type `taskschd.msc`
- Click "Create Task" (not "Basic Task")
**General tab:**
- Name: `Kopia Primary Nightly Backup`
- ✔ Run whether user is logged on or not
- ✔ Run with highest privileges
- Configure for: Windows 10/11
**Triggers tab:**
- New → Daily at 2:00 AM
- ✔ Enabled
**Actions tab:**
- Program: `powershell.exe`
- Arguments: `-ExecutionPolicy Bypass -File C:\Scripts\kopia-primary-nightly.ps1`
- Start in: `C:\Scripts`
**Conditions tab:**
- ✔ Wake the computer to run this task
- ✔ Start only if on AC power (recommended for laptops)
**Settings tab:**
- ✔ Allow task to be run on demand
- ✔ Run task as soon as possible after scheduled start is missed
- ❌ Stop the task if it runs longer than...
**Note:** When creating the task, use PIN (not Windows password) when prompted. For scheduled task credential: use password Harvey123= (MS account password)
#### Vault Backup Setup (Critical Data)
1. **Create vault config directory**
```powershell
New-Item -ItemType Directory -Force -Path C:\Users\cindy\.config\kopia-vault
```
2. **Connect to vault repository**
```powershell
kopia --config-file="C:\Users\cindy\.config\kopia-vault\repository.config" `
repository connect server `
--url=https://192.168.5.10:51516 `
--override-username=cindy@DESKTOP-QLSVD8P-vault `
--server-cert-fingerprint=<VAULT_SERVER_CERT_FINGERPRINT>
```
**Note:** Replace `<VAULT_SERVER_CERT_FINGERPRINT>` with the actual fingerprint from the vault server.
3. **Create vault backup script**
```powershell
New-Item -ItemType File -Path C:\Scripts\kopia-vault-nightly.ps1
```
Add this content to `C:\Scripts\kopia-vault-nightly.ps1`:
```powershell
# Kopia Vault Backup Script
# Backs up critical data to vault repository for offsite replication
$env:KOPIA_PASSWORD = "LucyDog123"
$KOPIA_CONFIG = "C:\Users\cindy\.config\kopia-vault\repository.config"
# Define critical directories to back up
$VaultDirs = @(
"C:\Users\cindy\Documents",
"C:\Users\cindy\Pictures",
"C:\Users\cindy\Desktop\Important"
)
# Log header
Add-Content -Path C:\Logs\kopia-vault.log -Value "=== Vault backup started at $(Get-Date) ==="
# Backup each directory
foreach ($dir in $VaultDirs) {
if (Test-Path $dir) {
Add-Content -Path C:\Logs\kopia-vault.log -Value "Backing up: $dir"
kopia --config-file="$KOPIA_CONFIG" snapshot create $dir `
| Tee-Object -FilePath C:\Logs\kopia-vault.log -Append
} else {
Add-Content -Path C:\Logs\kopia-vault.log -Value "Directory not found: $dir"
}
}
# Log completion
Add-Content -Path C:\Logs\kopia-vault.log -Value "=== Vault backup completed at $(Get-Date) ==="
Add-Content -Path C:\Logs\kopia-vault.log -Value ""
```
4. **Create log directory**
```powershell
New-Item -ItemType Directory -Force -Path C:\Logs
```
5. **Create scheduled task for vault backup**
- Press `Win + R` → type `taskschd.msc`
- Click "Create Task"
**General tab:**
- Name: `Kopia Vault Nightly Backup`
- ✔ Run whether user is logged on or not
- ✔ Run with highest privileges
**Triggers tab:**
- New → Daily at 3:00 AM (after primary backup)
- ✔ Enabled
**Actions tab:**
- Program: `powershell.exe`
- Arguments: `-ExecutionPolicy Bypass -File C:\Scripts\kopia-vault-nightly.ps1`
- Start in: `C:\Scripts`
**Conditions/Settings:** Same as primary backup task
---
## ZFS Replication to Offsite Pi Vaults
### Setup on ZNAS (Source)
1. **Create snapshot script**
```bash
sudo nano /usr/local/bin/vault-snapshot.sh
```
Add this content:
```bash
#!/bin/bash
# Create ZFS snapshot of vault dataset for replication
DATASET="zpool/vault/backup"
SNAPSHOT_NAME="vault-$(date +%Y%m%d-%H%M%S)"
# Create snapshot
zfs snapshot "${DATASET}@${SNAPSHOT_NAME}"
# Keep only last 7 days of snapshots on source
zfs list -t snapshot -o name -s creation | grep "^${DATASET}@vault-" | head -n -7 | xargs -r -n 1 zfs destroy
echo "Created snapshot: ${DATASET}@${SNAPSHOT_NAME}"
```
2. **Make executable**
```bash
sudo chmod +x /usr/local/bin/vault-snapshot.sh
```
3. **Schedule snapshot creation**
```bash
sudo crontab -e
# Add this line (create snapshot daily at 4 AM, after vault backups complete)
0 4 * * * /usr/local/bin/vault-snapshot.sh >> /var/log/vault-snapshot.log 2>&1
```
4. **Create replication script**
```bash
sudo nano /usr/local/bin/vault-replicate.sh
```
Add this content:
```bash
#!/bin/bash
# Replicate vault dataset to offsite Pi systems
DATASET="zpool/vault/backup"
PI1_HOST="pi-vault-1.local" # Update with actual hostname/IP
PI2_HOST="pi-vault-2.local" # Update with actual hostname/IP
PI_USER="admin"
REMOTE_DATASET="tank/vault-backup" # Update with actual dataset on Pi
# Get the latest snapshot
LATEST_SNAP=$(zfs list -t snapshot -o name -s creation | grep "^${DATASET}@vault-" | tail -n 1)
if [ -z "$LATEST_SNAP" ]; then
echo "No snapshots found for replication"
exit 1
fi
echo "Replicating snapshot: $LATEST_SNAP"
# Function to replicate to a target
replicate_to_target() {
local TARGET_HOST=$1
echo "=== Replicating to $TARGET_HOST ==="
# Get the last snapshot on remote (if any)
LAST_REMOTE=$(ssh ${PI_USER}@${TARGET_HOST} "zfs list -t snapshot -o name -s creation 2>/dev/null | grep '^${REMOTE_DATASET}@vault-' | tail -n 1" || echo "")
if [ -z "$LAST_REMOTE" ]; then
# Initial replication (full send)
echo "Performing initial full replication to $TARGET_HOST"
zfs send -c $LATEST_SNAP | ssh ${PI_USER}@${TARGET_HOST} "zfs receive -F ${REMOTE_DATASET}"
else
# Incremental replication
echo "Performing incremental replication to $TARGET_HOST"
LAST_SNAP_NAME=$(echo $LAST_REMOTE | cut -d'@' -f2)
zfs send -c -i ${DATASET}@${LAST_SNAP_NAME} $LATEST_SNAP | ssh ${PI_USER}@${TARGET_HOST} "zfs receive -F ${REMOTE_DATASET}"
fi
# Clean up old snapshots on remote (keep last 30 days)
ssh ${PI_USER}@${TARGET_HOST} "zfs list -t snapshot -o name -s creation | grep '^${REMOTE_DATASET}@vault-' | head -n -30 | xargs -r -n 1 zfs destroy"
echo "Replication to $TARGET_HOST completed"
}
# Replicate to both Pi systems
replicate_to_target $PI1_HOST
replicate_to_target $PI2_HOST
echo "All replications completed at $(date)"
```
5. **Make executable**
```bash
sudo chmod +x /usr/local/bin/vault-replicate.sh
```
6. **Set up SSH keys for passwordless replication**
```bash
# Generate SSH key if needed
ssh-keygen -t ed25519 -C "znas-replication"
# Copy to both Pi systems
ssh-copy-id admin@pi-vault-1.local
ssh-copy-id admin@pi-vault-2.local
```
7. **Schedule replication**
```bash
sudo crontab -e
# Add this line (replicate daily at 5 AM, after snapshot creation)
0 5 * * * /usr/local/bin/vault-replicate.sh >> /var/log/vault-replicate.log 2>&1
```
### Setup on Pi Vault Systems (Targets)
Repeat these steps on both Pi Vault 1 and Pi Vault 2:
1. **Create ZFS pool on SSD** (if not already done)
```bash
# Assuming SSD is /dev/sda
sudo zpool create tank /dev/sda
```
2. **Create dataset for receiving backups**
```bash
sudo zfs create tank/vault-backup
```
3. **Set appropriate permissions**
```bash
# Allow the replication user to receive snapshots
sudo zfs allow admin receive,create,mount,destroy tank/vault-backup
```
4. **Verify replication** (after first run)
```bash
zfs list -t snapshot | grep vault-
```
---
## Maintenance and Monitoring
### Regular Health Checks
**On Clients:**
```bash
# Linux
sudo kopia snapshot list
sudo kopia snapshot verify --file-parallelism=8
sudo kopia repository status
# Windows (PowerShell)
kopia snapshot list
kopia snapshot verify --file-parallelism=8
kopia repository status
```
**On ZNAS:**
```bash
# Check ZFS health
zpool status
# Check both Kopia servers are running
docker ps | grep kopia
# Check vault snapshots
zfs list -t snapshot | grep "vault/backup"
# Check replication logs
tail -f /var/log/vault-replicate.log
# View server statuses
docker exec kopia-server-primary kopia server status
docker exec kopia-server-vault kopia server status
```
**On Pi Vaults:**
```bash
# Check received snapshots
zfs list -t snapshot | grep vault-backup
# Check available space
zfs list tank/vault-backup
```
### Monthly Maintenance Tasks
1. **Verify vault backups are replicating**
```bash
# On ZNAS
cat /var/log/vault-replicate.log | grep "completed"
# On Pi systems
zfs list -t snapshot -o name,creation | grep vault-backup | tail
```
2. **Test restore from vault repository**
```bash
# Connect to vault repo and verify a random snapshot
kopia --config-file=/path/to/vault/config repository connect server --url=...
kopia snapshot list
kopia snapshot verify --file-parallelism=8
```
3. **Check disk space on all systems**
4. **Review backup logs for errors**
### Backup Policy Recommendations
**Primary Repository:**
- Retention: 7 daily, 4 weekly, 6 monthly
- Compression: enabled
- All data from clients
**Vault Repository:**
- Retention: 14 daily, 8 weekly, 12 monthly, 3 yearly
- Compression: enabled
- Only critical data for offsite protection
**ZFS Snapshots:**
- Keep 7 days on ZNAS (source)
- Keep 30 days on Pi vaults (targets)
---
## Disaster Recovery Procedures
### Scenario 1: Restore from Primary Repository
```bash
# Linux
sudo kopia snapshot list
sudo kopia snapshot restore <snapshot-id> /restore/location
# Windows
kopia snapshot list
kopia snapshot restore <snapshot-id> C:\restore\location
```
### Scenario 2: Restore from Vault Repository (Offsite)
If ZNAS is unavailable, restore directly from Pi vault:
1. **On Pi vault:**
```bash
# Mount the latest snapshot
LATEST=$(zfs list -t snapshot -o name | grep vault-backup | tail -n 1)
zfs clone $LATEST tank/vault-backup-restore
```
2. **Access Kopia repository directly:**
```bash
kopia repository connect filesystem --path=/tank/vault-backup-restore
kopia snapshot list
kopia snapshot restore <snapshot-id> /restore/location
```
3. **Clean up after restore:**
```bash
zfs destroy tank/vault-backup-restore
```
### Scenario 3: Complete System Rebuild
1. Rebuild ZNAS and restore vault dataset from Pi
2. Reinstall Kopia server in Docker
3. Point server to restored vault repository
4. Reconnect clients to primary and vault repositories
5. Resume scheduled backups
---
## Troubleshooting
### Client can't connect to repository
```bash
# Check both servers are running
docker ps | grep kopia
# Should see both kopia-server-primary and kopia-server-vault
# Check firewall
sudo ufw status | grep 51515
sudo ufw status | grep 51516
# Verify certificate fingerprints
docker exec kopia-server-primary kopia server status
docker exec kopia-server-vault kopia server status
# Check server logs
docker logs kopia-server-primary
docker logs kopia-server-vault
```
### Vault replication failing
```bash
# Check SSH connectivity
ssh admin@pi-vault-1.local "echo Connected"
# Check ZFS pool health
zpool status
# Check remote dataset exists
ssh admin@pi-vault-1.local "zfs list tank/vault-backup"
# Manual test send
zfs send -n -v zpool/vault/backup@latest | ssh admin@pi-vault-1.local "cat > /dev/null"
```
### Windows scheduled task not running
- Check Task Scheduler → Task History
- Verify PIN/password authentication (use password Harvey123= for task credential)
- Check that computer is awake at scheduled time
- Review power settings (prevent sleep, wake for tasks)
- Check log files: `C:\Logs\kopia-primary.log` and `C:\Logs\kopia-vault.log`
### Snapshot cleanup not working
```bash
# Manually clean old snapshots
zfs list -t snapshot -o name,used,creation | grep vault-backup
# Remove specific snapshot
zfs destroy zpool/vault/backup@vault-YYYYMMDD-HHMMSS
```
---
## Security Notes
1. **Passwords in scripts:** Current implementation stores passwords in plaintext in scripts. For production, consider:
- Windows Credential Manager
- Linux keyring or encrypted credential storage
- Environment variables set at system level
2. **SSH keys:** Replication uses SSH keys. Keep private keys secure and use passphrase protection where possible.
3. **Network security:** Kopia server uses HTTPS with certificate validation. Ensure certificate fingerprint is verified on first connection.
4. **Physical security:** Offsite Pi vaults should be stored in secure locations with different risk profiles (fire, flood, theft).
---
## Quick Reference Commands
### Kopia Client Commands
```bash
# List snapshots
kopia snapshot list
# Create snapshot
kopia snapshot create /path/to/backup
# Verify integrity
kopia snapshot verify --file-parallelism=8
# Check repository status
kopia repository status
# View policies
kopia policy list
# Mount snapshot (Linux)
kopia mount <snapshot-id> /mnt/snapshot
# Use alternate config (for vault repository)
kopia --config-file=/path/to/vault/repository.config snapshot list
```
### ZFS Commands
```bash
# List snapshots
zfs list -t snapshot
# Create manual snapshot
zfs snapshot zpool/vault/backup@manual-$(date +%Y%m%d)
# Send full snapshot
zfs send zpool/vault/backup@snapshot | ssh user@host zfs receive tank/backup
# Send incremental
zfs send -i @old @new zpool/vault/backup | ssh user@host zfs receive tank/backup
# List replication progress
zpool status -v
# Check dataset size
zfs list -o space zpool/vault/backup
```
---
## Appendix: System Specifications
**ZNAS:**
- ZFS fileserver
- Docker running **two** Kopia servers:
- **kopia-server-primary** on port 51515
- **kopia-server-vault** on port 51516
- IP: 192.168.5.10
- Datasets:
- `/srv/vault/kopia_repository` (zpool/vault/kopia_repository) - Primary repository
- `/srv/vault/backup` (zpool/vault/backup) - Vault repository (replicated)
**Clients:**
- **docker2** (Linux) - Backs up /DockerVol/
- Primary: Every 3 hours → port 51515
- Vault: Daily at 3 AM (critical directories only) → port 51516
- **DESKTOP-QLSVD8P** (Windows - Cindy's desktop) - Backs up C:\Users\cindy
- Primary: Daily at 2 AM → port 51515
- Vault: Daily at 3 AM (Documents, Pictures, Important files) → port 51516
- Kopia password: LucyDog123
- Task Scheduler credential: Harvey123=
**Offsite Vaults:**
- **Pi Vault 1** - Raspberry Pi with SSD (tank/vault-backup)
- **Pi Vault 2** - Raspberry Pi with SSD (tank/vault-backup)
**Server Certificates:**
- Primary server SHA256: `696a4999f594b5273a174fd7cab677d8dd1628f9b9d27e557daa87103ee064b2`
- Vault server SHA256: *(get from `docker exec kopia-server-vault kopia server status`)*
---
## Workflow Summary
### Daily Backup Flow
**2:00 AM** - Cindy's desktop primary backup runs
**3:00 AM** - docker2 vault backup runs
**3:00 AM** - Cindy's desktop vault backup runs
**4:00 AM** - ZNAS creates ZFS snapshot of vault dataset
**5:00 AM** - ZNAS replicates vault snapshot to both Pi systems
**Every 3 hours** - docker2 primary backup runs
### What Gets Backed Up Where
**Primary Repository (Full Backups):**
- docker2: /DockerVol/ (all Docker volumes)
- Cindy: C:\Users\cindy (entire user profile, minus temp files)
**Vault Repository (Critical Data for Offsite):**
- docker2: Selected critical Docker volumes
- Cindy: Documents, Pictures, Important desktop files
**Offsite (Via ZFS Send):**
- Entire vault repository (all clients' critical data)
- Replicated to 2 separate Pi systems
---
## Future Enhancements
Consider adding:
- Email notifications on backup failures
- Monitoring dashboard (Grafana/Prometheus)
- Backup validation automation
- Additional retention policies per client
- Encrypted credentials storage
- Remote monitoring of Pi vault systems
- Automated restore testing
- Bandwidth throttling for replication
- Multiple ZFS snapshot retention policies
---
## Change Log
- **2025-02-11** - Initial comprehensive documentation created
- Added two-tier backup strategy (primary + vault)
- Added ZFS replication procedures for offsite backup
- Added Pi vault setup instructions
- Added disaster recovery procedures
- Consolidated all client configurations
- Added workflow diagrams and timing
---
## Support and Feedback
For issues or improvements to this documentation, contact the system administrator.
**Useful Resources:**
- Kopia Documentation: https://kopia.io/docs/
- ZFS Administration Guide: https://openzfs.github.io/openzfs-docs/
- Kopia GitHub: https://github.com/kopia/kopia

View file

@ -0,0 +1,113 @@
# kopia
## Overview
The kopia stack is a Docker Swarm configuration for the Kopia backup service in NetGrimoire. It provides snapshot backups and deduplication capabilities.
---
## Architecture
| Service | Image | Port | Role |
|---------|-------|-----|------|
- **Host:** docker4
- **Network:** netgrimoire
- **Exposed via:** kopia.netgrimoire.com, 51515 (via Caddy reverse proxy)
- **Homepage group:** Backup
---
## Build & Configuration
### Prerequisites
None specified.
### Volume Setup
```bash
mkdir -p /DockerVol/kopia/config
mkdir -p /DockerVol/kopia/cache
mkdir -p /DockerVol/kopia/cert
```
### Environment Variables
```bash
# generate: openssl rand -hex 32 for secrets
POUID=1964
PGID=1964
KOPIA_PASSWORD=F@lcon13
KOPIA_SERVER_USERNAME=admin
KOPIA_SERVER_PASSWORD=F@lcon13
TZ=America/Chicago
```
### Deploy
```bash
cd services/swarm/stack/kopia
set -a && source .env && set +a
docker stack config --compose-file kopia-stack.yml > resolved.yml
docker stack deploy --compose-file resolved.yml kopia
rm resolved.yml
docker stack services kopia
```
### First Run
After deployment, check the status of the Kopia service and verify that backups are being created.
---
## User Guide
### Accessing kopia
| Service | URL | Purpose |
|---------|-----|---------|
- **kopia**: https://kopia.netgrimoire.com (via Caddy reverse proxy)
### Primary Use Cases
To use Kopia in NetGrimoire, create a new backup set and configure the service to run as desired.
### NetGrimoire Integrations
This service integrates with Uptime Kuma for monitoring and other services through environment variables and labels.
---
## Operations
### Monitoring
```bash
docker stack services kopia
docker service logs -f kopia
```
### Backups
Critical backups are stored at `/DockerVol/kopia/config` and `/DockerVol/kopia/cache`. Reconstructable backups can be restored from `/DockerVol/kopia/cache`.
### Restore
To restore a backup, run the following command:
```bash
./deploy.sh
```
---
## Common Failures
| Symptom | Cause | Fix |
|---------|-------|-----|
| Backups not being created | Insufficient storage or network issues | Check storage and network conditions. |
| Service not starting | Incorrect environment variables or Docker configuration | Review `.env` file and `docker-compose.yml`. |
---
## Changelog
| Date | Commit | Summary |
|------|--------|---------|
| 2026-04-07 | d3206f11 | Initial documentation for kopia stack. |
| 2026-02-11 | aa13ac64 | Minor adjustments to environment variables and volume setup. |
| 2026-01-30 | 15f5f655 | Initial commit with basic configuration and service setup. |
<No changelog entries available from diffs above>
---
## Notes
- Generated by Gremlin on 2026-04-07T19:20:00.179Z
- Source: swarm/kopia.yaml
- Review User Guide and Changelog sections

View file

@ -0,0 +1,44 @@
---
title: Offsite Vault Architecture
description: Two Pi vault nodes — ZFS raw send, syncoid, Pocket Grimoire
published: true
date: 2026-04-12T00:00:00.000Z
tags: vault, offsite, zfs, kopia
editor: markdown
dateCreated: 2026-04-12T00:00:00.000Z
---
# Offsite Vault Architecture
## Overview
Two offsite nodes receive ZFS replication from `znas`:
| Node | Location | Role |
|------|----------|------|
| Vault Pi (dedicated) | Offsite / home shelf | Kopia offsite server, ZFS vault pool |
| Pocket Grimoire | Travel / portable | Portable vault + media, also a vault node |
## Replication Method
ZFS raw send via `syncoid` with `-w` flag (raw/encrypted mode):
```bash
# Dedicated vault Pi
syncoid -w znas:vault/data vault-pi:vault/data
# Pocket Grimoire pre-travel
syncoid znas:vault/Green/Pocket pocket:/srv/greenpg/Green
```
The `-w` flag sends encrypted ZFS streams. The receiving node stores data in its encrypted form — no decryption keys are needed on the vault nodes. Keys stay exclusively on `znas`.
## Kopia Offsite Server
The vault container (`vault.yaml`) runs a Kopia server on port 51516 that serves as the remote endpoint for the dedicated Pi vault. Accessible at `vault.netgrimoire.com`.
## Pocket Grimoire as Vault Node
Pocket Grimoire's ZFS pool (`pocket-green` at `/srv/greenpg/`) receives a `syncoid` push from `znas` before each trip. This makes Pocket Grimoire an offsite backup node whenever it leaves the house.
See [Pocket Grimoire Sync](/Pocket-Grimoire/Sync/Pre-Travel-Sync) for the pre-travel checklist.

View file

@ -0,0 +1,60 @@
---
title: Vault Grimoire
description: Storage and backup — the dragon guards the data hoard
published: true
date: 2026-04-12T00:00:00.000Z
tags: vault, storage, backup
editor: markdown
dateCreated: 2026-04-12T00:00:00.000Z
---
# Vault Grimoire
![vault-badge](/images/vault-badge.png)
The Vault Grimoire covers all storage and backup infrastructure. Data starts at `znas`, is deduplicated and encrypted by Kopia, and replicates offsite to two Pi vault nodes — one dedicated vault Pi and one inside Pocket Grimoire.
---
## Sections
| Section | Contents |
|---------|----------|
| [ZFS](/Vault-Grimoire/ZFS/Storage-Layout) | ZFS pools, datasets, NFS exports, commands reference |
| [Kopia](/Vault-Grimoire/Kopia/Kopia-Overview) | Backup repos, retention, restore, two-repo architecture |
| [Backups](/Vault-Grimoire/Backups/Services-Backup) | Per-service backup runbooks (Immich, MailCow, Nextcloud, Wiki, services) |
| [Offsite](/Vault-Grimoire/Offsite/Vault-Architecture) | Pi vault nodes, ZFS raw send, syncoid workflow |
---
## Offsite Vault Architecture
```
znas (primary)
└── ZFS pool → Kopia dedup → encrypted repo
├── syncoid -w → Pi Vault (dedicated offsite)
└── syncoid → Pocket Grimoire (portable vault node)
```
Both offsite nodes receive ZFS raw send with the `-w` flag. Encryption keys stay on `znas`. The vault nodes store encrypted data only — no keys needed there.
---
## Two-Repo Architecture
Kopia uses two separate containers on different ports:
| Container | Repo | URL | Purpose |
|-----------|------|-----|---------|
| kopia | Primary vault | `kopia.netgrimoire.com` | Main backup, dedup, retention |
| vault | Offsite server | `vault.netgrimoire.com` (port 51516) | Replication target for Pi vaults |
One Kopia server instance per repository. They cannot share.
---
## Key Rules
- ZFS encryption cannot be done in-place. Migration requires `rsync` to a new encrypted dataset, then ZFS raw send with `-w` to vaults (no key exposure on vault side).
- ZFS must fully mount before NFS starts on znas. Systemd override required: `After=zfs-import.target zfs-mount.service`.
- Loopback NFS mount needs `x-systemd.after=nfs-server.service` in fstab.

View file

@ -0,0 +1,393 @@
---
title: ZFS-NFS-Exports
description: Exporting NFS shares from ZFS datasets
published: true
date: 2026-02-23T21:58:20.626Z
tags:
editor: markdown
dateCreated: 2026-02-01T20:45:40.210Z
---
# NFS Configuration
## Overview
ZNAS exports storage via NFSv4. All exports are ZFS datasets mounted directly to `/export/*` — no bind mounts. NFS is configured to wait for ZFS at boot via a systemd override.
ZNAS also mounts its own NFS exports back to itself at `/data/nfs/znas`. This is intentional: Docker Swarm containers scheduled to ZNAS need to access NAS storage at the same paths as containers running on other swarm members. The loopback mount provides a consistent NFS-backed path regardless of which node a container lands on.
All other clients are Linux systems using autofs.
---
## Server Configuration
### ZFS Mountpoints
ZFS datasets mount directly to `/export/*`. No bind mounts are used.
```
vault → /export
vault/Common → /export/Common
vault/Data → /export/Data
vault/Data/media_books → /export/Data/media/books
vault/Data/media_comics → /export/Data/media/comics
vault/Docker → /export/Docker
vault/Green → /export/Green
vault/Green/Pocket → /export/Green/Pocket
vault/Photos → /export/Photos
```
Verify at any time:
```bash
mount | grep export
```
### /etc/exports
```
# NFSv4 - pseudo filesystem root
/export *(ro,fsid=0,no_root_squash,no_subtree_check,crossmnt)
# Shares beneath the NFSv4 root
/export/Common *(fsid=4,rw,no_subtree_check,insecure)
/export/Data *(fsid=5,rw,no_subtree_check,insecure,crossmnt)
/export/Data/media/books *(fsid=51,rw,no_subtree_check,insecure,nohide)
/export/Data/media/comics *(fsid=52,rw,no_subtree_check,insecure,nohide)
/export/Docker *(fsid=29,rw,no_root_squash,sync,no_subtree_check,insecure)
/export/Green *(fsid=30,rw,no_root_squash,no_subtree_check,insecure)
/export/photos *(fsid=31,rw,no_root_squash,no_subtree_check,insecure)
```
**Key options:**
- `fsid=0` on `/export` — required for NFSv4 pseudo-root. Clients enumerate all exports from here.
- `crossmnt` — allows NFS to cross ZFS dataset boundaries when traversing the tree.
- `nohide` — required on `media/books` and `media/comics` because they are separate ZFS datasets mounted beneath the `vault/Data` export path. Without it clients see empty directories.
- `no_root_squash` — Docker and Green exports allow root writes. Required for container volume mounts.
- `insecure` — permits connections from unprivileged ports (>1024). Required for some Linux NFS clients and all macOS clients.
- `sync` on Docker — forces synchronous writes for container volume safety.
### systemd Boot Order Override
NFS is configured to wait for ZFS to fully mount before starting.
`/etc/systemd/system/nfs-server.service.d/override.conf`:
```ini
[Unit]
After=zfs-import.target zfs-mount.service local-fs.target
Requires=zfs-import.target zfs-mount.service
```
Apply after any changes:
```bash
sudo systemctl daemon-reload
sudo systemctl restart nfs-server
```
### Autofs Disabled on Server
Autofs is disabled on ZNAS itself. It must only run on NFS clients. Running autofs on the server creates recursive mount loops.
```bash
sudo systemctl stop autofs
sudo systemctl disable autofs
```
---
## Loopback Mount (Docker Swarm)
ZNAS mounts its own NFS exports back to itself at `/data/nfs/znas`. This ensures containers scheduled to ZNAS by Docker Swarm access storage at the same NFS-backed paths as containers running on any other swarm member — consistent regardless of which node a service lands on.
Swarm container volume mounts reference paths under `/data/nfs/znas/` rather than `/export/` directly.
### The Timing Problem
Getting this mount to survive reboots reliably was non-trivial. The loopback has a chicken-and-egg dependency chain:
1. ZFS must import and mount pools before NFS server can export anything
2. NFS server must be fully started before the loopback mount can succeed
3. The loopback mount must be established before Docker Swarm containers start
A plain `_netdev` fstab entry is not sufficient — `_netdev` only guarantees the network is up, not that the NFS server is ready. The mount would race against NFS startup and fail silently or hang.
### Solution — fstab with x-systemd.after
The loopback is established via `/etc/fstab` using the `x-systemd.after` option to explicitly declare the dependency on `nfs-server.service`:
```
localhost:/ /data/nfs/znas nfs4 defaults,_netdev,x-systemd.after=nfs-server.service 0 0
```
`x-systemd.after=nfs-server.service` causes systemd-fstab-generator to automatically create a mount unit (`data-nfs-znas.mount`) with `After=nfs-server.service` in its `[Unit]` block. This guarantees the full dependency chain:
```
zfs-import.target
→ zfs-mount.service
→ nfs-server.service (via nfs-server override.conf)
→ data-nfs-znas.mount (via x-systemd.after in fstab)
→ remote-fs.target
→ Docker Swarm containers
```
The generated unit (created automatically at runtime by systemd-fstab-generator — not a file on disk):
```ini
# /run/systemd/generator/data-nfs-znas.mount
[Unit]
Documentation=man:fstab(5) man:systemd-fstab-generator(8)
SourcePath=/etc/fstab
After=nfs-server.service
Before=remote-fs.target
[Mount]
What=localhost:/
Where=/data/nfs/znas
Type=nfs4
Options=defaults,_netdev,x-systemd.after=nfs-server.service
```
**Do not create a hand-written systemd mount unit for this.** systemd-fstab-generator handles it automatically from the fstab entry. A manual unit would conflict.
### Verify Loopback is Active
```bash
mount | grep data/nfs/znas
# Should show: localhost:/ on /data/nfs/znas type nfs4 (...)
systemctl status data-nfs-znas.mount
# Should show: active (mounted)
```
---
## Client Configuration
All non-Swarm clients are Linux systems using autofs.
### Autofs Configuration
`/etc/auto.master` (relevant entry):
```
/data/nfs /etc/auto.nfs
```
`/etc/auto.nfs`:
```
znas -fstype=nfs4 192.168.5.10:/
```
This mounts the full NFSv4 tree from ZNAS at `/data/nfs/znas` on demand — the same path used by the loopback mount on ZNAS itself. All swarm nodes (including ZNAS) access NAS storage via `/data/nfs/znas/`.
**Note:** Autofs must be enabled on clients and disabled on the NFS server. Running autofs on the server creates recursive mount loops.
### Adding a New Client
```bash
# Install autofs if not present
sudo apt install autofs
# Add to /etc/auto.master if not already present
echo "/data/nfs /etc/auto.nfs" | sudo tee -a /etc/auto.master
# Create or update /etc/auto.nfs
echo "znas -fstype=nfs4 192.168.5.10:/" | sudo tee -a /etc/auto.nfs
# Reload autofs
sudo systemctl reload autofs
# Trigger mount by accessing the path
ls /data/nfs/znas/
```
### Manual Mount (testing only)
```bash
# Verify exports are visible from client
showmount -e 192.168.5.10
# Test manual mount
sudo mkdir -p /mnt/znas
sudo mount -t nfs4 192.168.5.10:/ /mnt/znas
# Verify tree is accessible
ls /mnt/znas/Data/media/books/
# Unmount after testing
sudo umount /mnt/znas
```
---
## Adding New Datasets
When creating a new ZFS dataset that needs to be NFS-accessible:
```bash
# Create with the correct mountpoint from the start
sudo zfs create -o mountpoint=/export/Data/new_folder vault/Data/new_folder
```
The dataset will be automatically visible via NFS due to `crossmnt` and `nohide` on the parent — no changes to `/etc/exports` needed unless the new dataset requires different access controls.
If different permissions are required, add an explicit entry to `/etc/exports` and reload:
```bash
sudo exportfs -ra
```
---
## Current Export List
Verified via `showmount -e 127.0.0.1`:
```
/export/photos *
/export/Green *
/export/Docker *
/export/Data/media/comics *
/export/Data/media/books *
/export/Data *
/export/Common *
/export *
```
---
## Known Gotchas
**Loopback mount races NFS at boot** — This was the hardest problem to solve. A plain `_netdev` fstab entry only guarantees the network interface is up, not that the NFS server is ready to accept connections. The loopback mount would attempt before NFS finished starting and fail silently or hang. The fix is `x-systemd.after=nfs-server.service` in the fstab options, which causes systemd-fstab-generator to emit an `After=nfs-server.service` dependency in the generated mount unit. The full required boot chain is: `zfs-import.target``zfs-mount.service``nfs-server.service``data-nfs-znas.mount`. Each link must be explicit.
**Do not hand-write a systemd mount unit for the loopback** — systemd-fstab-generator creates `data-nfs-znas.mount` automatically from the fstab entry at runtime (in `/run/systemd/generator/`, not `/etc/systemd/system/`). Creating a manual unit in `/etc/systemd/system/` will conflict with the generated one.
**Autofs must be disabled on the server** — Running autofs on ZNAS itself creates a recursive mount loop. Autofs belongs on clients only. If autofs is accidentally re-enabled on ZNAS it will fight with the fstab loopback mount.
**NFSv4 pseudo-root is required** — The `/export` entry with `fsid=0` is mandatory for NFSv4 clients. Without it clients cannot enumerate the export tree. Do not remove it even though it looks redundant.
**`nohide` on sub-datasets** — `vault/Data/media_books` and `vault/Data/media_comics` are separate ZFS datasets mounted beneath the `vault/Data` export path. NFS does not cross filesystem boundaries by default. Without `nohide` clients see empty directories at those paths even though the data is present.
**Do not use bind mounts for ZFS datasets** — Configure ZFS mountpoints directly to `/export/*`. Bind mounts in fstab for ZFS datasets cause ordering problems and are unnecessary.
**Always set mountpoints when creating new datasets** — If a dataset is created without an explicit mountpoint it will inherit the parent's path and may not be visible or exportable correctly. Set `mountpoint=` at creation time.
---
## Troubleshooting
### Datasets not visible via NFS
```bash
# Verify dataset is mounted
zfs list | grep dataset_name
# Check NFS can read it
sudo -u nobody ls -la /export/path/to/dataset/
# Reload exports
sudo exportfs -ra
sudo systemctl restart nfs-server
```
### Client shows empty directories
```bash
# Clear NFS cache and remount
sudo umount -f /mnt/znas
sudo mount -t nfs4 192.168.5.10:/ /mnt/znas
# Test without caching to isolate the problem
sudo mount -t nfs4 -o noac,lookupcache=none 192.168.5.10:/ /mnt/znas
```
### After reboot, exports are empty
```bash
# Confirm ZFS mounted before NFS started
systemctl status zfs-mount.service
systemctl status nfs-server.service
# Confirm override is in place
systemctl cat nfs-server.service | grep -A5 "\[Unit\]"
```
### Loopback mount not working for Swarm containers
```bash
# Check mount unit status
systemctl status data-nfs-znas.mount
# Verify full dependency chain is satisfied
systemctl status zfs-mount.service
systemctl status nfs-server.service
systemctl status data-nfs-znas.mount
# Verify loopback is mounted
mount | grep data/nfs/znas
# If missing, mount manually to test
sudo mount -t nfs4 127.0.0.1:/ /data/nfs/znas
# Check container can see the path
docker run --rm -v /data/nfs/znas/Data:/data alpine ls /data
```
If the unit fails at boot, confirm the fstab entry includes `x-systemd.after=nfs-server.service` — without this the mount races against NFS startup and loses. A plain `_netdev` entry is not sufficient.
---
## Configuration Files Reference
### /etc/exports
```
/export *(ro,fsid=0,no_root_squash,no_subtree_check,crossmnt)
/export/Common *(fsid=4,rw,no_subtree_check,insecure)
/export/Data *(fsid=5,rw,no_subtree_check,insecure,crossmnt)
/export/Data/media/books *(fsid=51,rw,no_subtree_check,insecure,nohide)
/export/Data/media/comics *(fsid=52,rw,no_subtree_check,insecure,nohide)
/export/Docker *(fsid=29,rw,no_root_squash,sync,no_subtree_check,insecure)
/export/Green *(fsid=30,rw,no_root_squash,no_subtree_check,insecure)
/export/photos *(fsid=31,rw,no_root_squash,no_subtree_check,insecure)
```
### /etc/systemd/system/nfs-server.service.d/override.conf
```ini
[Unit]
After=zfs-import.target zfs-mount.service local-fs.target
Requires=zfs-import.target zfs-mount.service
```
### /etc/fstab (ZNAS system mounts only)
ZFS datasets are not listed here — ZFS handles its own mounting. Only system partitions appear:
```
# / - btrfs on nvme0n1p2
/dev/disk/by-uuid/40c60952-0340-4a78-81f9-5b2193da26c6 / btrfs defaults 0 1
# /boot - ext4 on nvme0n1p3
/dev/disk/by-uuid/4abb4efa-0b2b-4e4a-bcaf-78227db4628f /boot ext4 defaults 0 1
# swap
/dev/disk/by-uuid/d07437a0-3d0e-417a-a88e-438c603c2237 none swap sw 0 0
# /srv - btrfs on nvme0n1p5
/dev/disk/by-uuid/c66e81ff-436e-4d6f-980b-6f4875ea7c8e /srv btrfs defaults 0 1
```
---
## Command Reference
- Show active exports: `sudo exportfs -v`
- Reload exports: `sudo exportfs -ra`
- Show available exports (from any host): `showmount -e 192.168.5.10`
- Restart NFS: `sudo systemctl restart nfs-server`
- Check NFS status: `systemctl status nfs-server`
- Verify ZFS mounts: `mount | grep export`
- Verify loopback: `mount | grep data/nfs`

View file

@ -0,0 +1,239 @@
---
title: Netgrimoire Storage
description: Where is it at
published: true
date: 2026-02-23T18:38:27.621Z
tags:
editor: markdown
dateCreated: 2026-01-22T21:10:37.035Z
---
# NAS Storage Layout
## Overview
ZNAS is the primary NAS for Netgrimoire. It runs Ubuntu with OpenZFS and serves as the source of truth for all storage, including datasets that replicate out to the Pocket Grimoire portable system.
The system mounts everything under `/export/` for NFS sharing, with select datasets mounted under `/srv/` for local service consumption (Immich, NextCloud-AIO, Kopia, backup).
## ZFS Pools
- `vault` — primary NAS storage, RAIDZ1×2, 8 drives
- `greenpg` — Pocket Grimoire GREEN SSD (Kanguru UltraLock), docked for sync when present
## Zpool Architecture
```
pool: vault
state: ONLINE
scan: scrub repaired 0B in 2 days 10:24:08 with 0 errors on Tue Feb 10 10:48:10 2026
config:
NAME STATE READ WRITE CKSUM
vault ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ata-ST24000DM001-3Y7103_ZXA06K45 ONLINE 0 0 0
ata-ST24000DM001-3Y7103_ZXA08CVY ONLINE 0 0 0
ata-ST24000DM001-3Y7103_ZXA0FP10 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
ata-ST16000NE000-2RW103_ZL2Q3275 ONLINE 0 0 0
ata-ST16000NM001G-2KK103_ZL26R5XW ONLINE 0 0 0
ata-ST16000NT001-3LV101_ZRS0KVQW ONLINE 0 0 0
ata-WDC_WD140EDFZ-11A0VA0_9MG81N0J ONLINE 0 0 0
ata-WDC_WD140EDFZ-11A0VA0_Y5J35Z6C ONLINE 0 0 0
errors: No known data errors
```
`raidz1-0` is 3× Seagate 24TB (~48TB usable). `raidz1-1` is 3× Seagate 16TB + 2× WD 14TB (~56TB usable — the 14TB drives are the limiting factor per stripe, leaving ~2TB/drive unused on the 16TB drives). Total pool: ~94TB raw, 39TB currently available.
```
pool: greenpg
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
greenpg ONLINE 0 0 0
scsi-1Kanguru_UltraLock_DB090722NC10001 ONLINE 0 0 0
errors: No known data errors
```
`greenpg` is a portable pool. Export it before physically moving to Pocket Grimoire.
## ZFS Datasets
| Dataset | Mountpoint | Used | Avail | Refer | Quota | Compression | Purpose |
|---------|-----------|------|-------|-------|-------|-------------|---------|
| `vault` | `/export` | 55.3T | 39.0T | 771G | none | 1.00x | Pool root / NFSv4 pseudo-root |
| `vault/Common` | `/export/Common` | 214G | 39.0T | 214G | none | 1.06x | General shared storage |
| `vault/Data` | `/export/Data` | 38.4T | 39.0T | 36.4T | none | 1.00x | Primary data — 36.4T lives directly in dataset root |
| `vault/Data/media_books` | `/export/Data/media/books` | 925G | 39.0T | 925G | none | 1.03x | Book library |
| `vault/Data/media_comics` | `/export/Data/media/comics` | 1.15T | 39.0T | 1.15T | none | 1.00x | Comic library |
| `vault/Green` | `/export/Green` | 14.7T | 5.31T | 9.66T | 20T | 1.00x | Personal media — 9.66T direct, 5.02T in Pocket child |
| `vault/Green/Pocket` | `/export/Green/Pocket` | 5.02T | 2.48T | 5.02T | 7.5T | 1.00x | Pocket Grimoire replication source |
| `vault/Kopia` | `/srv/vault/kopia_repository` | 349G | 39.0T | 349G | none | 1.02x | Kopia backup repository |
| `vault/NextCloud-AIO` | `/srv/NextCloud-AIO` | 341G | 39.0T | 341G | none | 1.01x | NextCloud data |
| `vault/Photos` | `/export/Photos` | 135K | 39.0T | 135K | none | 1.00x | Photos (sparse — see notes) |
| `vault/backup` | `/srv/vault/backup` | 442G | 582G | 442G | 1T | 1.00x | Local system backups |
| `vault/docker` | `/export/Docker` | 22.2G | 39.0T | 22.2G | none | 1.13x | Docker volumes |
| `vault/immich` | `/srv/immich` | 117G | 39.0T | 117G | none | 1.03x | Immich photo service data |
| `greenpg` | `/greenpg` | 2.94T | 4.20T | 96K | — | 1.00x | GREEN SSD pool root (portable) |
| `greenpg/Pocket` | `/greenpg/Pocket` | 2.94T | 4.20T | 2.94T | — | 1.00x | Personal media + Stash data |
**Notes on specific datasets:**
`vault/Data` — 36.4T lives directly in the dataset root at `/export/Data/`. `media_books` and `media_comics` are the only child datasets and account for ~2T combined. The remaining ~36T is general data stored directly under the parent.
`vault/Green` — 9.66T lives directly in `/export/Green/` with the remaining 5.02T in the `Pocket` child dataset. The 20T quota caps total Green growth. `vault/Green/Pocket` has its own 7.5T sub-quota.
`vault/Photos` — nearly empty (135K). Photos are primarily managed through Immich at `vault/immich`. This dataset may be vestigial or reserved for future use.
`vault/backup` — has a hard 1T quota. Unlike other vault datasets which draw from the full 39T pool availability, this dataset is capped. Current usage is 442G with 582G remaining.
Compression ratios are near 1.00x across most datasets because content is already compressed (media files, binary data). `vault/docker` (1.13x) and `vault/Common` (1.06x) see modest gains from compressible config and text data.
## NFS Exports
All exports use NFSv4 with `/export` as the pseudo-filesystem root (`fsid=0`).
| Export | fsid | Options | Notes |
|--------|------|---------|-------|
| `/export` | 0 | `ro, no_root_squash, no_subtree_check, crossmnt` | NFSv4 pseudo-root — required for v4 clients |
| `/export/Common` | 4 | `rw, no_subtree_check, insecure` | General access |
| `/export/Data` | 5 | `rw, no_subtree_check, insecure, crossmnt` | Data root |
| `/export/Data/media/books` | 51 | `rw, no_subtree_check, insecure, nohide` | Separate ZFS dataset — needs `nohide` |
| `/export/Data/media/comics` | 52 | `rw, no_subtree_check, insecure, nohide` | Separate ZFS dataset — needs `nohide` |
| `/export/Docker` | 29 | `rw, no_root_squash, sync, no_subtree_check, insecure` | Container volumes |
| `/export/Green` | 30 | `rw, no_root_squash, no_subtree_check, insecure` | Personal media + Pocket Grimoire source |
| `/export/photos` | 31 | `rw, no_root_squash, no_subtree_check, insecure` | Photos |
Current `/etc/exports`:
```
/export *(ro,fsid=0,no_root_squash,no_subtree_check,crossmnt)
/export/Common *(fsid=4,rw,no_subtree_check,insecure)
/export/Data *(fsid=5,rw,no_subtree_check,insecure,crossmnt)
/export/Data/media/books *(fsid=51,rw,no_subtree_check,insecure,nohide)
/export/Data/media/comics *(fsid=52,rw,no_subtree_check,insecure,nohide)
/export/Docker *(fsid=29,rw,no_root_squash,sync,no_subtree_check,insecure)
/export/Green *(fsid=30,rw,no_root_squash,no_subtree_check,insecure)
/export/photos *(fsid=31,rw,no_root_squash,no_subtree_check,insecure)
```
There is also an active loopback NFSv4 mount on the system itself:
```
localhost:/ → /data/nfs/znas (NFSv4.2, rsize/wsize=1M)
```
## SMB Shares
*(To be documented.)*
## Standard Paths
- `/export/` — NFS root (vault pool root)
- `/export/Data/` — primary data
- `/export/Data/media/books/` — book library
- `/export/Data/media/comics/` — comic library
- `/export/Green/` — personal media
- `/export/Green/Pocket/` — Pocket Grimoire replication source
- `/export/Docker/` — container volumes
- `/export/Photos/` — photos
- `/srv/immich/` — Immich service data
- `/srv/NextCloud-AIO/` — NextCloud data
- `/srv/vault/kopia_repository/` — Kopia backup repo
- `/srv/vault/backup/` — local system backups
- `/greenpg/Pocket/` — GREEN SSD when docked for sync
## Permissions & UID/GID Model
*(To be documented — dockhand UID 1964, container access rules.)*
## Services Using Local Mounts
These datasets are consumed directly by services on ZNAS and are not NFS-exported:
| Service | Dataset | Mountpoint |
|---------|---------|-----------|
| Immich | `vault/immich` | `/srv/immich` |
| NextCloud-AIO | `vault/NextCloud-AIO` | `/srv/NextCloud-AIO` |
| Kopia | `vault/Kopia` | `/srv/vault/kopia_repository` |
| Local backup | `vault/backup` | `/srv/vault/backup` |
## Pocket Grimoire Integration
`vault/Green/Pocket` is the replication source for the Pocket Grimoire GREEN SSD (`greenpg`). It contains personal media and Stash application data (database, previews, blobs). See the Pocket Grimoire deployment guide for full procedures.
**Fast resync when GREEN SSD is physically docked on ZNAS:**
```bash
# Check pool name (retains whatever name it had when last exported)
zpool list | grep greenpg
# Import if needed
sudo zpool import greenpg
sudo zfs load-key greenpg
sudo zfs mount -a
# Sync
sudo syncoid vault/Green/Pocket greenpg/Pocket
# Export before physically disconnecting — always do this
sudo zfs unmount greenpg/Pocket
sudo zfs unmount greenpg
sudo zpool export greenpg
```
**Network sync** runs automatically on Pocket Grimoire via a 6-hour syncoid systemd timer when connected over the network.
## Backup & Snapshot Strategy
**Snapshots:**
```bash
# Manual pre-change snapshot
zfs snapshot vault/Docker@before-upgrade
# List all snapshots
zfs list -t snapshot
# List snapshots for a specific dataset
zfs list -t snapshot -r vault/Green
```
**Kopia:** Repository at `vault/Kopia``/srv/vault/kopia_repository`. *(Document snapshot policy and sources.)*
**Replication:** `vault/Green/Pocket``greenpg/Pocket` via syncoid. See Pocket Grimoire Integration above.
## Known Gotchas
**NFSv4 pseudo-root** — The `/export` entry with `fsid=0` is required for NFSv4 clients to enumerate subdirectories. Do not remove it even if it appears redundant.
**`nohide` on sub-datasets** — `vault/Data/media_books` and `vault/Data/media_comics` are separate ZFS datasets mounted beneath the `vault/Data` export path. NFS does not cross filesystem boundaries by default. Without `nohide` clients see empty directories at those paths.
**`vault/backup` quota** — This dataset has a hard 1T quota and does not share the general pool availability. Current headroom is ~582G. Monitor before large backup operations.
**`vault/Green` quota** — Capped at 20T total with a 7.5T sub-quota on `vault/Green/Pocket`. The GREEN SSD itself is ~7TB, so the sub-quota is the effective ceiling for the Pocket sync.
**raidz1-1 mixed drive sizes** — The three 16TB drives in raidz1-1 have ~2TB/drive going unused because RAIDZ1 stripes are limited by the smallest drive in the VDEV (14TB WDs). This capacity is permanently unavailable unless the VDEV is rebuilt.
**Kanguru UltraLock hardware encryption** — The GREEN SSD has hardware-level PIN protection in addition to ZFS encryption. The drive must be hardware-unlocked before `zpool import` will see it.
**Always export `greenpg` before disconnecting** — Export flushes writes and marks the pool clean. Pulling the drive without exporting risks a dirty import on next use.
**`vault/Data` root usage** — 36.4T lives directly in `/export/Data/` rather than in child datasets. This is normal for this setup but means `zfs list` on the parent alone shows the full usage without a breakdown.
## Command Reference
- Health: `zpool status`
- Space available to pool: `zpool list`
- Space available to datasets: `zfs list`
- Dataset configuration: `zfs get -r compression,dedup,recordsize,atime,quota,reservation vault`
- Create a snapshot: `zfs snapshot vault/Docker@before-upgrade`
- List snapshots: `zfs list -t snapshot`
- Reload NFS exports: `sudo exportfs -ra`
- Show active NFS exports: `sudo exportfs -v`
- Run a scrub: `sudo zpool scrub vault`
- Sync GREEN SSD: `sudo syncoid vault/Green/Pocket greenpg/Pocket`

View file

@ -0,0 +1,168 @@
---
title: ZFS Common Commands
description: ZFS Commands
published: true
date: 2026-02-20T04:26:23.798Z
tags: zfs commands
editor: markdown
dateCreated: 2026-01-31T15:23:07.585Z
---
# ZFS Essential Commands Cheat Sheet
---
## Pool Health & Status
zpool status
zpool status -v
zpool list
## Dataset Space & Usage
zfs list
zfs list -r vault
zfs list -o name,used,avail,refer,logicalused,compressratio
zfs list -r -o name,used,avail,refer,quota,reservation vault
## Dataset Properties & Settings
zfs get all vault/dataset
zfs get -r compression,dedup,recordsize,atime,quota,reservation vault
zfs get -r compression,dedup,recordsize,encryption,keylocation,keyformat,snapdir vault
zfs get -s local -r all vault
zfs get quota,refquota,reservation,refreservation -r vault
## Mount Encrypted Dataset
zfs load-key vault/Green/Pocket
zfs mount vault/Green/Pocket
## Pool I/O & Performance Monitoring
zpool iostat -v 1
arcstat 1
cat /proc/spl/kstat/zfs/arcstats
## Scrubs & Data Integrity
zpool scrub vault
zpool scrub -s vault
zpool status
## Snapshots
zfs snapshot vault/dataset@snapname
zfs list -t snapshot
zfs rollback vault/dataset@snapname
zfs clone vault/dataset@snapname vault/dataset-clone
## Replication (Send / Receive)
zfs send vault/dataset@snap1 | zfs receive backup/dataset
zfs send -i snap1 vault/dataset@snap2 | zfs receive backup/dataset
zfs send -nv vault/dataset@snap1
## Dataset Tuning (Live-Safe Changes)
zfs set compression=lz4 vault/dataset
zfs set recordsize=1M vault/dataset
zfs set atime=off vault/dataset
zfs set dedup=on vault/dataset
## Encryption Management
zfs get encryption,keylocation,keystatus vault/dataset
zfs unload-key vault/dataset
zfs load-key vault/dataset
## Disk Preparation & Cleanup
wipefs /dev/sdX
wipefs -a /dev/sdX
zpool labelclear -f /dev/sdX
sgdisk --zap-all /dev/sdX
lsblk -f /dev/sdX
## Pool Expansion (Add VDEV)
zpool add vault raidz2 \
/dev/disk/by-id/disk1 \
/dev/disk/by-id/disk2 \
/dev/disk/by-id/disk3 \
/dev/disk/by-id/disk4 \
/dev/disk/by-id/disk5
## Pool Import / Recovery
zpool import
zpool import vault
zpool import -f vault
zpool import -o readonly=on vault
## Locks, Holds & History
zfs holds -r vault
zpool history
zfs diff vault/dataset@snap1 vault/dataset@snap2
## Deduplication & Compression Stats
zpool list -v
zdb -DD vault
## Inventory / Documentation Dumps
zpool status > zpool-status.txt
zfs list -r > zfs-layout.txt
zfs get -r all vault > zfs-settings.txt
## Top 10 Must-Know Commands
zpool status
zpool list
zpool iostat -v 1
zpool scrub vault
zfs list
zfs get all vault/dataset
zfs snapshot vault/dataset@snap
zfs rollback vault/dataset@snap
zfs send | zfs receive
arcstat 1

View file

@ -0,0 +1,39 @@
---
title: Authentication Overview
description: SSO, LDAP, and access control in Netgrimoire
published: true
date: 2026-04-12T00:00:00.000Z
tags: ward, auth, sso
editor: markdown
dateCreated: 2026-04-12T00:00:00.000Z
---
# Authentication Overview
## SSO Providers
| Provider | Scope | URL |
|----------|-------|-----|
| Authentik | `*.netgrimoire.com` | Protected via `caddy.import_1: authentik` label |
| Authelia | `*.wasted-bandwidth.net` | Green Grimoire + Shadow Grimoire services |
Both providers use LLDAP as their LDAP backend.
## LLDAP
Lightweight LDAP directory at `ldap.netgrimoire.com`. Postgres backend. Provides the user directory for both Authentik and Authelia.
See [LDAP Client Setup](/Ward-Grimoire/Access/LDAP-Client-Setup) for configuring hosts to authenticate via LLDAP.
## Vaultwarden
Password manager at `pass.netgrimoire.com`. Protected by Authentik.
## WireGuard
5 VPN peers on 192.168.32.0/24. Managed in OPNsense. See [Host Inventory](/Keystone-Grimoire/Hosts/Host-Inventory) for peer assignments.
## YubiKey (Planned)
- PIV SSH authentication on all hosts — highest-impact pending integration
- Challenge-response for LUKS / Kopia key derivation on znas

View file

@ -0,0 +1,218 @@
---
title: LDAP Client Setup
description:
published: true
date: 2026-02-20T04:33:31.862Z
tags:
editor: markdown
dateCreated: 2026-01-21T13:21:40.588Z
---
Your content here✅ LLDAP + SSSD Node Join Checklist (FINAL)
Assumptions
LLDAP server: docker4
LDAP URI: ldap://docker4:3890
Base DN: dc=netgrimoire,dc=com
Users/groups use lowercase attributes (uidnumber, gidnumber, homedirectory, unixshell, uniquemember)
No TLS (lab only)
Docker group GID = 1964 in LDAP
This node is Ubuntu/Debian-based
0⃣ Safety first (do this every time)
Open two SSH sessions to the node
Confirm you can sudo
Do not edit nsswitch.conf until SSSD is confirmed working
1⃣ Install required packages
sudo apt update
sudo apt install -y sssd sssd-ldap sssd-tools libpam-sss libnss-sss libsss-sudo ldap-utils oddjob oddjob-mkhomedir
Ensure legacy LDAP NSS is NOT installed
sudo apt purge -y libnss-ldap libpam-ldap nslcd libnss-ldapd libpam-ldapd || true
sudo apt autoremove -y
2⃣ Verify LDAP connectivity (must pass)
getent hosts docker4
nc -vz docker4 3890
ldapwhoami -x -H ldap://docker4:3890 \
-D 'uid=admin,ou=people,dc=netgrimoire,dc=com' -w 'F@lcon13'
❌ If any fail → stop and fix networking/DNS/firewall.
3⃣ Create /etc/sssd/sssd.conf (single file, no includes)
sudo vi /etc/sssd/sssd.conf
Paste exactly:
[sssd]
services = nss, pam, ssh
config_file_version = 2
domains = netgrimoire.com
[nss]
filter_users = root
filter_groups = root
[pam]
offline_failed_login_attempts = 3
offline_failed_login_delay = 5
[ssh]
[domain/netgrimoire.com]
id_provider = ldap
auth_provider = ldap
chpass_provider = ldap
access_provider = permit
enumerate = false
cache_credentials = true
ldap_uri = ldap://docker4:3890
ldap_schema = rfc2307bis
ldap_search_base = dc=netgrimoire,dc=com
ldap_auth_disable_tls_never_use_in_production = true
ldap_id_use_start_tls = false
ldap_tls_reqcert = never
ldap_default_bind_dn = uid=admin,ou=people,dc=netgrimoire,dc=com
ldap_default_authtok = F@lcon13
# USERS (lowercase attributes)
ldap_user_search_base = ou=people,dc=netgrimoire,dc=com
ldap_user_object_class = posixAccount
ldap_user_name = uid
ldap_user_gecos = cn
ldap_user_uid_number = uidnumber
ldap_user_gid_number = gidnumber
ldap_user_home_directory = homedirectory
ldap_user_shell = unixshell
# GROUPS (lowercase attributes)
ldap_group_search_base = ou=groups,dc=netgrimoire,dc=com
ldap_group_object_class = groupOfUniqueNames
ldap_group_name = cn
ldap_group_gid_number = gidnumber
ldap_group_member = uniquemember
4⃣ Fix permissions (SSSD will NOT start without this)
sudo chown root:root /etc/sssd/sssd.conf
sudo chmod 600 /etc/sssd/sssd.conf
sudo chmod 700 /etc/sssd
Validate:
sudo sssctl config-check
5⃣ Start SSSD cleanly
sudo systemctl enable sssd
sudo systemctl stop sssd
sudo rm -f /var/lib/sss/db/* /var/lib/sss/mc/*
sudo systemctl start sssd
Verify:
sudo systemctl status sssd --no-pager -l
sudo sssctl domain-status netgrimoire.com
Expected:
Online status: Online
LDAP: docker4
6⃣ Enable NSS lookups via SSSD (LDAP-first)
Edit /etc/nsswitch.conf:
passwd: sss files systemd
group: sss files systemd
shadow: sss files
Test:
getent passwd graymutt
getent group docker
id graymutt
7⃣ 🔑 RE-INITIALIZE PAM (THIS IS THE STEP YOU REMEMBERED)
This step is mandatory on Debian/Ubuntu.
sudo pam-auth-update
In the menu, ENABLE:
✅ Unix authentication
✅ SSSD
✅ Create home directory on login
DISABLE:
❌ LDAP Authentication (legacy)
❌ Kerberos (unless you explicitly use it)
Press OK.
8⃣ Verify PAM wiring
grep pam_sss.so /etc/pam.d/common-*
grep pam_mkhomedir /etc/pam.d/common-session
You should see:
session required pam_mkhomedir.so skel=/etc/skel umask=0022
9⃣ Final login test (definitive)
ssh graymutt@localhost
Expected:
Login succeeds
/home/graymutt is auto-created
Correct LDAP groups present
🔟 (Optional but recommended) Remove local docker group
If the node has a local docker group (gid 998):
sudo groupdel docker
Verify:
getent group docker
Expected:
docker:x:1964:graymutt,dockhand
🧪 Fast troubleshooting commands
sudo sssctl domain-status netgrimoire.com
sudo tail -n 200 /var/log/sssd/sssd_netgrimoire.com.log
sudo systemctl status sssd --no-pager -l

View file

@ -0,0 +1,239 @@
---
title: Opnsense - Additional Blocklists
description: Blocklists
published: true
date: 2026-02-23T21:54:13.019Z
tags:
editor: markdown
dateCreated: 2026-02-23T21:46:39.562Z
---
# OPNsense Additional Blocklists
**Service:** Firewall Aliases — URL Table blocklists
**Host:** OPNsense firewall
**Applies To:** WAN and ATT interfaces
**Update Frequency:** Daily (automatic)
---
## Overview
Your firewall already uses Spamhaus DROP and EDROP as IP blocklists. These three additional lists fill specific gaps that Spamhaus does not cover:
| List | What It Blocks | Why It's Needed |
|---|---|---|
| Feodo Tracker | Botnet command & control IPs | Stops malware on your network phoning home |
| Abuse.ch SSLBL | IPs with malicious SSL certificates | Catches malware that uses HTTPS to hide C2 traffic |
| Emerging Threats | Confirmed active attack IPs | Broad coverage of IPs currently conducting scans and exploits |
These work at the **firewall alias level** — the same mechanism as your existing Spamhaus lists. Traffic from/to these IPs is blocked before it reaches any service.
> ✓ These lists are also used by Suricata internally. Adding them as firewall aliases provides a second, independent enforcement point at the packet filter level — meaning blocks happen even if Suricata is restarted or temporarily inactive.
---
## Current Blocklist State
From your configuration, these lists are already present and working:
| Alias | List | Status |
|---|---|---|
| SpamHaus_Drop | Spamhaus DROP | ⚠ Alias active, **rule disabled** |
| Spamhaus_edrop | Spamhaus EDROP | ⚠ Alias active, **rule disabled** |
| crowdsec_blacklists | CrowdSec IPv4 | ✓ Active |
| crowdsec6_blacklists | CrowdSec IPv6 | ✓ Active |
> ⚠ **First priority:** Before adding new blocklists, re-enable the existing Spamhaus block rules. See the Re-enable Existing Rules section at the bottom of this document.
---
## Step 1 — Add Feodo Tracker Alias
Navigate to **Firewall → Aliases → Add**
| Field | Value |
|---|---|
| Name | `Feodo_Tracker` |
| Type | `URL Table (IPs)` |
| Description | `Abuse.ch Feodo Tracker — Botnet C2 IPs` |
| URL | `https://feodotracker.abuse.ch/downloads/ipblocklist.txt` |
| Refresh Frequency | `1` day |
| Enabled | ✓ |
Click **Save**, then **Apply Changes**.
**Verify the list loaded:**
Go to **Firewall → Diagnostics → Aliases**, select `Feodo_Tracker` — you should see a list of IP addresses populated.
---
## Step 2 — Add Abuse.ch SSLBL Alias
Navigate to **Firewall → Aliases → Add**
| Field | Value |
|---|---|
| Name | `AbuseCH_SSLBL` |
| Type | `URL Table (IPs)` |
| Description | `Abuse.ch SSL Blacklist — Malicious SSL certificate IPs` |
| URL | `https://sslbl.abuse.ch/blacklist/sslipblacklist.txt` |
| Refresh Frequency | `1` day |
| Enabled | ✓ |
Click **Save**, then **Apply Changes**.
> ✓ The SSL Blacklist specifically targets IPs that have been observed using SSL/TLS certificates associated with malware botnets. It catches C2 traffic that would otherwise be hidden inside HTTPS.
---
## Step 3 — Add Emerging Threats Alias
Navigate to **Firewall → Aliases → Add**
| Field | Value |
|---|---|
| Name | `ET_Block_IPs` |
| Type | `URL Table (IPs)` |
| Description | `Emerging Threats — Active attack and scanning IPs` |
| URL | `https://rules.emergingthreats.net/fwrules/emerging-Block-IPs.txt` |
| Refresh Frequency | `1` day |
| Enabled | ✓ |
Click **Save**, then **Apply Changes**.
---
## Step 4 — Create Firewall Block Rules
One block rule per alias, applied to both WAN and ATT interfaces. Add these rules **above** your existing PASS rules on each interface.
Navigate to **Firewall → Rules → WAN**
### Rule 1 — Block Feodo Tracker (WAN)
Click **Add** (add to top of ruleset):
| Field | Value |
|---|---|
| Action | Block |
| Interface | WAN |
| Direction | in |
| Protocol | any |
| Source | `Feodo_Tracker` (single host or alias) |
| Destination | any |
| Description | `Block Feodo Tracker botnet C2` |
| Log | ✓ Enable logging |
Click **Save**.
### Rule 2 — Block Abuse.ch SSLBL (WAN)
| Field | Value |
|---|---|
| Action | Block |
| Interface | WAN |
| Direction | in |
| Protocol | any |
| Source | `AbuseCH_SSLBL` |
| Destination | any |
| Description | `Block Abuse.ch SSL Blacklist` |
| Log | ✓ Enable logging |
Click **Save**.
### Rule 3 — Block Emerging Threats (WAN)
| Field | Value |
|---|---|
| Action | Block |
| Interface | WAN |
| Direction | in |
| Protocol | any |
| Source | `ET_Block_IPs` |
| Destination | any |
| Description | `Block Emerging Threats IPs` |
| Log | ✓ Enable logging |
Click **Save**.
Click **Apply Changes** on the WAN rules page.
### Repeat for ATT Interface
Navigate to **Firewall → Rules → ATT** and add the same three rules with `Interface: ATT`. This ensures blocking applies to both WANs during the transition period, and only ATT after WAN is retired.
---
## Step 5 — Also Block Outbound (Optional but Recommended)
Adding outbound blocks catches the case where an internal device is already compromised and attempting to contact C2 infrastructure. Apply to the LAN interface, direction **out**:
Navigate to **Firewall → Rules → LAN**, add rules with:
- Direction: `out`
- Source: `any`
- Destination: the respective alias (`Feodo_Tracker`, `AbuseCH_SSLBL`, `ET_Block_IPs`)
- Action: `Block`
This means even if malware bypasses inbound filtering, outbound connections to known C2 IPs are still blocked.
---
## Re-enable Existing Spamhaus Rules
While you are in the firewall rules, re-enable the three currently disabled rules:
Navigate to **Firewall → Rules → WAN**
Find these three rules (they appear greyed out):
1. `Block DROP` — source: SpamHaus_Drop
2. `Block EDROP` — source: Spamhaus_edrop
3. GeoIP country block — source: Blocked_Countries
Click the **enable toggle** (grey circle icon) on each rule to enable them. Click **Apply Changes**.
> ✓ These aliases are already populated and refreshing automatically. The only reason they were not blocking is because the rules were disabled. Enabling them requires no other changes.
---
## Verifying Blocklists Are Working
### Check Alias Contents
**Firewall → Diagnostics → Aliases** — select each alias to see the current list of blocked IPs and confirm they are populated.
### Check Firewall Logs
**Firewall → Log Files → Live View** — filter by the rule description (e.g., `Feodo Tracker`) to see blocks in real time.
### Check Update Schedule
Aliases refresh on the schedule set during creation. To force an immediate refresh:
**Firewall → Diagnostics → Aliases → select alias → Flush + Force Update**
---
## Complete Blocklist Summary
After implementing all of the above, your firewall enforces the following IP blocklists:
| Alias | List | Covers | Update |
|---|---|---|---|
| SpamHaus_Drop | Spamhaus DROP | Hijacked/compromised netblocks | Daily |
| Spamhaus_edrop | Spamhaus EDROP | Extended DROP — bogon routes | Daily |
| Feodo_Tracker | Feodo Tracker | Botnet C2 IPs | Daily |
| AbuseCH_SSLBL | Abuse.ch SSLBL | Malicious SSL certificate IPs | Daily |
| ET_Block_IPs | Emerging Threats | Active scanners & attack IPs | Daily |
| crowdsec_blacklists | CrowdSec | Community-reported bad IPs (IPv4) | Real-time |
| crowdsec6_blacklists | CrowdSec | Community-reported bad IPs (IPv6) | Real-time |
| Blocked_Countries | MaxMind GeoIP | 70 blocked countries | Weekly |
Combined with Suricata (content inspection) and CrowdSec (IP reputation), this gives you a comprehensive multi-layer perimeter.
---
## Related Documentation
- [OPNsense Firewall](./opnsense-firewall) — parent firewall documentation, full alias list
- [Suricata IDS/IPS](./suricata-ids-ips) — content inspection layer, also uses these feed sources
- [CrowdSec](./crowdsec) — real-time IP reputation blocking

View file

@ -0,0 +1,182 @@
---
title: OpnSense - GIT Integration
description: Git Integration
published: true
date: 2026-02-23T21:53:24.522Z
tags:
editor: markdown
dateCreated: 2026-02-23T21:48:01.779Z
---
# OPNsense Git Backup (os-git-backup)
**Service:** os-git-backup
**Plugin:** os-git-backup
**Host:** OPNsense firewall
**Remote:** Forgejo on Netgrimoire
**Trigger:** Automatic on every config change
---
## Overview
Every change made to OPNsense — adding a firewall rule, updating an alias, changing a VPN config — modifies the underlying XML configuration file. By default there is no history of these changes. If a misconfiguration causes an outage, or if you need to audit what changed after a security incident, you have no record to work from.
os-git-backup solves this by committing the OPNsense configuration to a Git repository automatically every time a change is saved. Each commit records exactly what changed, when, and (if configured) which user made the change.
**Benefits:**
- Full audit trail of every configuration change
- One-command rollback to any previous state
- Offsite backup of firewall config via Forgejo → Kopia chain
- Diff view to understand exactly what a change did
---
## Pre-requisite: Create Forgejo Repository
Before installing the plugin, create a dedicated repository in Forgejo to receive the OPNsense config backups.
1. Log into your Forgejo instance on Netgrimoire
2. Create a new repository: `opnsense-config`
3. Set visibility to **Private** — firewall configs contain sensitive network topology
4. Do not initialize with a README (the plugin will push the first commit)
5. Note the SSH clone URL: `git@git.netgrimoire.com:youruser/opnsense-config.git`
---
## Installation
### Step 1 — Install the Plugin
1. Go to **System → Firmware → Plugins**
2. Search for `os-git-backup`
3. Click the **+** install button
4. Wait for installation to complete
5. Navigate to **System → Configuration → Backups** — a **Git** tab will appear
---
## Configuration
### Step 2 — Generate SSH Deploy Key
The OPNsense firewall needs an SSH key to authenticate to Forgejo without a password.
Navigate to **System → Configuration → Backups → Git**
1. Click **Generate SSH Key**
2. Copy the displayed **public key** — you will add this to Forgejo next
### Step 3 — Add Deploy Key to Forgejo
1. In Forgejo, go to your `opnsense-config` repository
2. Navigate to **Settings → Deploy Keys**
3. Click **Add Deploy Key**
4. Title: `OPNsense Firewall`
5. Key: paste the public key from Step 2
6. Enable **Allow Write Access** — the firewall needs to push commits
7. Click **Add Key**
### Step 4 — Configure the Plugin
Navigate to **System → Configuration → Backups → Git**
| Setting | Value | Notes |
|---|---|---|
| Enabled | ✓ | |
| URL | `git@git.netgrimoire.com:youruser/opnsense-config.git` | SSH URL from your Forgejo repo |
| Branch | `main` | |
| Name | `OPNsense Firewall` | Author name shown in commits |
| Email | `opnsense@netgrimoire.com` | Author email shown in commits |
| SSH Private Key | (auto-populated from Step 2) | |
| Backup Interval | On change | Commits every time config is saved |
Click **Save**.
### Step 5 — Test the Connection
Click **Backup Now** to trigger a manual backup. Then check your Forgejo repository — you should see an initial commit containing the OPNsense configuration XML.
If the push fails, check:
1. The deploy key has write access in Forgejo
2. The SSH URL is correct (use SSH, not HTTPS)
3. Forgejo is reachable from the firewall — test from OPNsense shell:
```bash
ssh -T git@git.netgrimoire.com
# Expected: Hi youruser! You've successfully authenticated...
```
---
## What Gets Backed Up
The plugin commits the OPNsense configuration file:
`/conf/config.xml`
This single file contains **everything** — interfaces, firewall rules, NAT, VPN configs, aliases, users, certificates, DHCP, DNS settings, and all plugin configurations. A restore from this file fully recreates the firewall state.
> ⚠ The config.xml contains **hashed passwords**, **VPN private keys**, and **API credentials**. The Forgejo repository must remain private. Ensure your Forgejo instance is not publicly accessible or that this repository is explicitly private.
---
## Using the Backup
### Viewing History
In Forgejo, navigate to the `opnsense-config` repository. Each commit represents one configuration save, with:
- Timestamp of the change
- Diff showing exactly what XML changed
- Author (OPNsense Firewall)
### Rolling Back a Change
If a configuration change causes problems:
**Option 1 — Restore via OPNsense UI:**
1. In Forgejo, find the commit you want to restore
2. Download the `config.xml` from that commit
3. In OPNsense: **System → Configuration → Backups → Restore**
4. Upload the config.xml and restore
**Option 2 — Restore via shell (if UI is unreachable):**
```bash
# SSH into OPNsense
ssh root@192.168.3.4
# The git repo is cloned locally — find it
find /conf -name ".git" -type d
# Check out the previous config
cd /conf/backup # or wherever the repo is cloned
git log --oneline -10
git checkout <commit-hash> -- config.xml
# Apply the restored config
/usr/local/sbin/opnsense-importer config.xml
```
### Diffing Changes
To see exactly what a specific change did:
```bash
# In Forgejo: click any commit → view the diff
# Alternatively, from the OPNsense shell:
cd <git repo path>
git diff HEAD~1 HEAD -- config.xml
```
---
## Integration with Kopia Backups
Since the git repository lives in Forgejo on Netgrimoire, it is automatically included in the Netgrimoire Kopia backup chain — no additional configuration needed. The OPNsense config history is backed up offsite along with everything else.
---
## Related Documentation
- [OPNsense Firewall](./opnsense-firewall) — parent firewall documentation
- [Forgejo](./forgejo) — Git repository host on Netgrimoire
- [Kopia Backups](./kopia) — offsite backup chain

View file

@ -0,0 +1,508 @@
---
title: OpnSense
description: Grimoire Firewall Configuration
published: true
date: 2026-02-23T21:31:26.008Z
tags:
editor: markdown
dateCreated: 2026-02-23T21:31:15.244Z
---
# OPNsense Firewall
**Host:** OPNsense.localdomain
**Timezone:** America/Chicago
**Documented:** February 23, 2026
**Status:** Active — AT&T migration in progress
---
## Overview
The network perimeter is protected by an OPNsense firewall running on dedicated hardware with four physical Intel i226-V NICs (igc0igc3). The firewall operates in a dual-WAN configuration during the transition from the legacy ISP to AT&T fiber, with AT&T becoming the permanent primary WAN. CrowdSec threat intelligence, GeoIP blocking, and Spamhaus DROP/EDROP lists provide layered perimeter security.
---
## Hardware & System
| Parameter | Value |
|---|---|
| Hostname | OPNsense |
| Domain | localdomain |
| Timezone | America/Chicago |
| Language | en_US |
| NAT Outbound Mode | Hybrid |
| System DNS | 8.8.8.8 (Google) — see DNS notes |
| DNS Allow Override | Enabled |
| SSH | Enabled (port 22) |
| Console Menu | Disabled (hardened) |
> ⚠ **DNS Note:** The system upstream DNS is set to 8.8.8.8. If dnscrypt-proxy or Unbound is configured, this should be updated to point to localhost or the internal DNS resolver (192.168.5.7). Review before enabling encrypted DNS.
---
## Network Interfaces
| Interface | Label | Physical NIC | IP Address | Role |
|---|---|---|---|---|
| wan | WAN | igc0 | 24.249.193.114/28 | Legacy primary WAN — being retired |
| opt1 | ATT | igc1 | 107.133.34.145/28 | New primary WAN — AT&T fiber |
| lan | LAN | igc3 | 192.168.3.4/29 | Internal LAN management segment |
| opt3 | OPT3 | igc2 | DHCP | Unassigned — spare interface |
| opt2 / wg1 | WG1 | wg1 (virtual) | WireGuard tunnel | WireGuard VPN interface |
| openvpn | OpenVPN | virtual | Tunnel only | OpenVPN (server + client configured) |
| lo0 | Loopback | lo0 | 127.0.0.1/8 | System loopback |
> ⚠ **OPT3 (igc2)** is on DHCP and currently unassigned. Disable this interface or assign it a role to reduce unnecessary attack surface.
---
## Gateways & Routing
### Active Gateways
| Gateway Name | Interface | IP | Role |
|---|---|---|---|
| WAN_DefRoute | wan (igc0) | 24.249.193.114 | Legacy default route — being retired |
| ATT | opt1 (igc1) | 107.133.34.145 | AT&T — becoming primary |
| LAN_GWv4 | lan (igc3) | 192.168.3.4 | LAN gateway |
### NAT Outbound Rules
Outbound NAT runs in **Hybrid** mode — automatic rules supplemented by manual overrides below.
| Interface | Source | NAT Target | Purpose |
|---|---|---|---|
| opt1 (ATT) | ATT_Out_1 group | opt1ip | Dad's Laptop + 192.168.5.128/25 out ATT |
| wan | MailCow_Ngnx (192.168.5.16) | 24.249.193.115 | Mail server — dedicated WAN IP |
| wan | PNCHarris_Internal | wanip | Internal subnets egress |
| wan | WireGuard (opt2) | — | WireGuard outbound NAT |
> ✓ The mail server already has a dedicated outbound IP (24.249.193.115) on WAN. This pattern should be replicated on ATT using a dedicated virtual IP from the static block.
---
## Firewall Aliases
### Host Aliases
| Alias | IP Address | Used For |
|---|---|---|
| caddy | 192.168.5.10 | Caddy reverse proxy |
| MailCow_Ngnx | 192.168.5.16 | MailCow nginx container |
| JellyFin_Host | 192.168.5.18 | Jellyfin media server |
| ISPConfig_Host | 192.168.4.11 | ISPConfig control panel |
| Dads_Laptop | 192.168.5.176 | Routed out ATT interface |
### Network Aliases
| Alias | Value | Used For |
|---|---|---|
| PNCHarris_Internal | 192.168.5.0/25, 192.168.3.0/24 | Primary internal subnets |
| Subnet_5_128_Mask_25 | 192.168.5.128/25 | Upper half of 192.168.5.x |
| ATT_Out_1 | Dads_Laptop + Subnet_5_128_Mask_25 | Traffic routed out ATT interface |
| Family_Subnet | (empty) | Defined but unpopulated |
### Port Aliases
| Alias | Ports | Used For |
|---|---|---|
| Web_Services | 80, 443 | HTTP/HTTPS |
| MailCow | 25, 110, 143, 465, 587, 993, 995, 4190 | Full MailCow mail protocol suite |
| ISPConfig | 25, 53, 143, 465, 587, 993, 995, 8080 | ISPConfig mail + DNS + admin |
| JellyFin_Port | 8096, 7096 | Jellyfin HTTP + HTTPS |
| Plex_Port_2 | (empty) | Defined but unpopulated |
### Security & Threat Intelligence Aliases
| Alias | Type | Source | Status |
|---|---|---|---|
| SpamHaus_Drop | URL Table | https://www.spamhaus.org/drop/drop.txt | ⚠ Rule DISABLED |
| Spamhaus_edrop | URL Table | https://www.spamhaus.org/drop/edrop.txt | ⚠ Rule DISABLED |
| Blocked_Countries | GeoIP | 70 countries — see GeoIP section | ⚠ Rule DISABLED |
| crowdsec_blacklists | External | CrowdSec IPv4 decisions | ✓ Active |
| crowdsec6_blacklists | External | CrowdSec IPv6 decisions | ✓ Active |
| crowdsec_blocklists | External | CrowdSec IPv4 (duplicate) | ✓ Active |
| crowdsec6_blocklists | External | CrowdSec IPv6 decisions (duplicate) | ✓ Active |
> ⚠ **Critical:** Spamhaus DROP, Spamhaus EDROP, and GeoIP country blocking are all defined and populated but their firewall rules are **disabled**. These are not currently being enforced. Re-enable these rules as an immediate priority.
> ⚠ There are duplicate CrowdSec alias pairs (`crowdsec_blacklists` and `crowdsec_blocklists` both handle IPv4). Review and consolidate to avoid confusion.
---
## Firewall Rules
### WAN Rules
| Action | Protocol | Source | Destination | Port(s) | Enabled | Description |
|---|---|---|---|---|---|---|
| BLOCK | Any | SpamHaus_Drop | Any | Any | ❌ No | Block Spamhaus DROP list |
| BLOCK | Any | Spamhaus_edrop | Any | Any | ❌ No | Block Spamhaus EDROP list |
| BLOCK | Any | Blocked_Countries | Any | Any | ❌ No | GeoIP country block |
| PASS | TCP | Any | MailCow_Ngnx | MailCow ports | ✓ Yes | Inbound mail |
| PASS | TCP | Any | JellyFin_Host | 8096, 7096 | ✓ Yes | Jellyfin access |
| PASS | UDP | Any | WAN IP | 51820 | ✓ Yes | WireGuard VPN ingress |
| PASS | TCP | Any | MailCow_Ngnx | 80, 443 | ✓ Yes | MailCow webmail |
| PASS | TCP | Any | caddy (192.168.5.10) | 80, 443 | ✓ Yes | Caddy reverse proxy |
> ⚠ All three block rules at the top of the WAN ruleset are disabled. The firewall is currently not enforcing Spamhaus or GeoIP blocking despite the aliases being populated.
### LAN Rules
| Action | Protocol | Source | Destination | Description |
|---|---|---|---|---|
| PASS | Any | ATT_Out_1 group | Any | Dad's Laptop + upper subnet out ATT |
| PASS | Any | LAN subnet | Any | Default allow LAN to any |
| PASS | Any | PNCHarris_Internal | Any | Internal subnets to any |
| PASS | Any | LAN subnet | Any | Default allow LAN IPv6 to any |
| PASS | TCP | PNCHarris_Internal | ISPConfig_Host:ISPConfig | LAN → ISPConfig redirect |
| PASS | TCP | PNCHarris_Internal | ISPConfig_Host:80/443 | LAN → ISPConfig web redirect |
| PASS | TCP | PNCHarris_Internal | caddy:80/443 | LAN → Caddy redirect |
| PASS | TCP | PNCHarris_Internal | MailCow_Ngnx:MailCow | LAN → MailCow redirect |
### WireGuard Interface Rules
| Action | Protocol | Source | Destination | Description |
|---|---|---|---|---|
| PASS | Any | Any | Any | Allow all from WireGuard peers — unrestricted |
> ⚠ The WireGuard interface allows all traffic from all peers with no restrictions. Consider scoping rules per peer as needs are better understood — some remote sites may only need access to specific services.
---
## NAT Port Forwards
### WAN Inbound
| Protocol | Public Port(s) | Internal Target | Internal Port(s) | Service |
|---|---|---|---|---|
| TCP | MailCow ports | 192.168.5.16 (MailCow_Ngnx) | MailCow ports | Mail (SMTP/IMAP/POP3/Sieve) |
| TCP | 80, 443 | 192.168.5.16 (MailCow_Ngnx) | 80, 443 | MailCow webmail |
| TCP | 8096, 7096 | 192.168.5.18 (JellyFin_Host) | 8096, 7096 | Jellyfin |
| TCP | 80, 443 | 192.168.5.10 (caddy) | 80, 443 | Caddy (all web services) |
### LAN Hairpin (Internal Redirect)
| Protocol | Port(s) | Internal Target | Description |
|---|---|---|---|
| TCP | MailCow ports | 192.168.5.16 | Internal mail access |
| TCP | 80, 443 | 192.168.5.10 (caddy) | Internal web via Caddy |
| TCP | ISPConfig ports | 192.168.4.11 | Internal ISPConfig access |
| TCP | 80, 443 | 192.168.4.11 | Internal ISPConfig web |
---
## VPN
### WireGuard
**Server: pncharris**
| Parameter | Value |
|---|---|
| Tunnel Address | 192.168.32.1/24 |
| Listen Port | 51820 (UDP) |
| DNS for Peers | 192.168.5.7 (internal DNS) |
| Interface | wg1 (OPT2) |
| Status | Enabled |
**Peers**
| Peer | Tunnel IP | Status | Notes |
|---|---|---|---|
| Obie | 192.168.32.2/32 | ✓ Enabled | |
| pncfishandmore | 192.168.32.3/32 | ✓ Enabled | Business location |
| GLNet (1) | 192.168.32.4/32 | ✓ Enabled | GL.iNet travel router |
| PortaPotty | 192.168.32.5/32 | ✓ Enabled | Remote site |
| GLNet (2) | 192.168.32.6/32 | ✓ Enabled | Second GL.iNet device |
> ✓ WireGuard peers use the internal DNS server (192.168.5.7) — internal hostnames resolve correctly over VPN.
### OpenVPN
An OpenVPN server and client are configured but details were not populated in the backup. Verify status in **VPN → OpenVPN** in the OPNsense UI.
---
## Security Features
### CrowdSec
CrowdSec is installed and fully operational at the firewall level.
| Parameter | Value |
|---|---|
| Agent | Enabled |
| Local API (LAPI) | Enabled — 127.0.0.1:8080 |
| Firewall Bouncer | Enabled |
| Rules | Enabled with logging |
| Firewall Bouncer Verbose | Disabled |
| Manual LAPI Config | Disabled (auto) |
CrowdSec decisions are fed into two alias pairs used in firewall rules:
- `crowdsec_blacklists` / `crowdsec6_blacklists` — IPv4 and IPv6 block lists
- `crowdsec_blocklists` / `crowdsec6_blocklists` — duplicate set (consolidate)
### GeoIP Blocking
GeoIP uses the MaxMind GeoLite2 database with a configured license key. **The blocking rule is currently disabled** — the alias is populated but not enforced.
**70 countries are blocked across four regions:**
| Region | Countries |
|---|---|
| Africa (49) | AO, BF, BI, BJ, BW, CD, CF, CG, CI, CM, DJ, DZ, EG, EH, ER, ET, GA, GH, GM, GN, GQ, GW, KE, LR, LS, LY, MA, ML, MR, MW, MZ, NA, NE, NG, RW, SD, SL, SN, SO, SS, ST, SZ, TD, TG, TN, TZ, UG, ZA, ZM, ZW |
| Middle East / Asia (12) | AF, BN, BT, CN, IQ, IR, KG, KP, KW, PH, QA, SA |
| Eastern Europe (4) | BG, RS, RU, RO |
| Latin America (4) | BR, EC, GT, HN |
### Spamhaus Blocklists
Both lists are configured as URL table aliases that auto-refresh, but **both blocking rules are currently disabled.**
| List | URL | Update |
|---|---|---|
| Spamhaus DROP | https://www.spamhaus.org/drop/drop.txt | Auto (URL table) |
| Spamhaus EDROP | https://www.spamhaus.org/drop/edrop.txt | Auto (URL table) |
---
## Internal Network Layout
### Known Subnets
| Subnet | Alias | Purpose |
|---|---|---|
| 192.168.3.0/24 | PNCHarris_Internal | LAN management segment |
| 192.168.5.0/25 | PNCHarris_Internal | Primary server subnet |
| 192.168.5.128/25 | Subnet_5_128_Mask_25 | Secondary server subnet / ATT routing |
| 192.168.32.0/24 | — | WireGuard tunnel network |
### Key Internal Hosts
| Hostname / Alias | IP | Role |
|---|---|---|
| caddy | 192.168.5.10 | Caddy reverse proxy (all web services) |
| MailCow_Ngnx | 192.168.5.16 | MailCow nginx container |
| JellyFin_Host | 192.168.5.18 | Jellyfin media server |
| ISPConfig_Host | 192.168.4.11 | ISPConfig control panel |
| Dads_Laptop | 192.168.5.176 | Routed via ATT interface |
| Internal DNS | 192.168.5.7 | DNS server (served to WireGuard peers) |
### DHCP
DHCP on the LAN interface (192.168.3.0/24) is currently **disabled**. No KEA or ISC DHCP ranges are active on the firewall. Devices likely use static IPs or a separate DHCP server downstream.
---
## Installed Plugins & Services
The following OPNsense components are present in the configuration:
| Plugin / Service | Status |
|---|---|
| WireGuard | ✓ Active — 1 server, 5 peers |
| CrowdSec | ✓ Active — agent + bouncer + LAPI |
| OpenVPN | Configured — verify in UI |
| IPsec / Swanctl | Present — verify in UI |
| Unbound Plus | Present — verify DNS configuration |
| Kea DHCP | Present — not active on LAN |
| DHCP Relay | Present |
| Netflow | Present |
| IDS/IPS (Suricata) | ❌ Not configured — see hardening plan |
| Proxy | Present — not actively used |
| Traffic Shaper | Present |
| Monit | Present |
| SNMP | Present |
| Syslog | Not configured — see hardening plan |
| Git Backup | Not installed — see hardening plan |
---
## AT&T Migration & Static IP Plan
### Current AT&T Interface
**Interface:** opt1 (igc1)
**Current IP:** 107.133.34.145/28
**Block:** /28 — up to 14 usable addresses, 5 static IPs allocated for use
### Recommended Static IP Allocation
| IP Slot | Dedicated To | Justification |
|---|---|---|
| IP 1 | **Mail (MailCow)** | Dedicated mail IP protects sender reputation. Never share with web services. Only ports 25/465/587/993/995/4190 NAT to 192.168.5.16. |
| IP 2 | **Web / Caddy** | All reverse-proxied services via Caddy. Keeps web and mail reputation independent. Replace current WAN NAT for ports 80/443 → 192.168.5.10. |
| IP 3 | **WireGuard VPN** | Dedicated IP for UDP/51820 only. Cleaner peer configs, stable endpoint, easy to firewall tightly — that IP accepts nothing else. |
| IP 4 | **Spare / Jellyfin** | Hold in reserve. Best candidate: dedicated Jellyfin IP (currently on WAN with ports 8096/7096). Media servers benefit from a clean IP separate from your main web presence. |
| IP 5 | **Admin / Out-of-band** | A locked-down IP for emergency remote OPNsense access. Firewall tightly — accept only from WireGuard peers or specific trusted source IPs. Never advertise publicly. |
### Implementation Steps
**Step 1 — Add Virtual IPs**
In OPNsense: **Firewall → Virtual IPs → Add**
For each additional static IP (IPs 15 excluding the interface IP):
- Type: `IP Alias`
- Interface: `ATT (opt1)`
- Address: `<static IP>/28`
- Description: e.g. `ATT_Mail`, `ATT_Web`, `ATT_WireGuard`
**Step 2 — Create NAT Rules Per Virtual IP**
In **Firewall → NAT → Port Forward**, create new rules on the ATT interface using the virtual IPs as the destination. Example for mail:
```
Interface: ATT (opt1)
Protocol: TCP
Destination: ATT_Mail virtual IP
Destination Port: MailCow alias
Redirect Target: 192.168.5.16 (MailCow_Ngnx)
Redirect Port: MailCow alias
```
Repeat for web (→ caddy 192.168.5.10) and WireGuard (UDP/51820).
**Step 3 — Update Outbound NAT**
Add manual outbound NAT rules so that each internal service exits through its dedicated virtual IP:
```
Interface: ATT (opt1)
Source: 192.168.5.16 (MailCow_Ngnx)
Target: ATT_Mail virtual IP
Interface: ATT (opt1)
Source: 192.168.5.10 (caddy)
Target: ATT_Web virtual IP
```
**Step 4 — Migrate WireGuard Endpoint**
Update peer configs to point to the ATT_WireGuard virtual IP on port 51820. Move the WAN WireGuard rule to ATT interface. Update DNS records if you have a hostname for the WireGuard endpoint.
**Step 5 — Update Firewall Block Rules**
Re-enable the Spamhaus and GeoIP block rules on the ATT interface. Apply them to the ATT WAN rules the same way they are (currently disabled) on WAN.
**Step 6 — DNS Updates**
Update all public DNS records to point to the new ATT static IPs:
- `mail.*` domains → ATT_Mail IP
- `*.netgrimoire.com`, `*.wasted-bandwidth.net`, etc. → ATT_Web IP
- WireGuard endpoint hostname → ATT_WireGuard IP
**Step 7 — Retire WAN (igc0)**
Once all services are verified on ATT, disable WAN NAT rules, remove port forward rules on WAN, and eventually disable the interface.
---
## Hardening Plan
The following items are recommended improvements, ordered by priority.
### Priority 1 — Re-enable Disabled Security Rules (Immediate)
All three security block rules on the WAN interface are currently disabled. These should be re-enabled immediately as they represent threat intelligence you have already configured but are not using.
1. Navigate to **Firewall → Rules → WAN**
2. Find rules: `Block DROP`, `Block EDROP`, and the GeoIP block rule
3. Click the enable toggle on each rule
4. Click **Apply Changes**
Repeat on the ATT interface once migrated.
### Priority 2 — Suricata IDS/IPS
Suricata is built into OPNsense but not yet configured. This is the most significant security gap — without it, there is no deep packet inspection or content-based threat detection.
**Setup steps:**
1. Go to **Services → Intrusion Detection → Administration**
2. Enable IDS/IPS, set interface to **ATT** (and WAN while active)
3. Set mode to **IPS** (inline blocking, not just alerting)
4. Under **Download**, enable the following rulesets:
- `ET Open` — Proofpoint Emerging Threats (free, comprehensive)
- `Abuse.ch SSL Blacklist` — malicious SSL certificate detection
- `Feodo Tracker` — botnet C2 blocking
5. Under **Policies**, set default action to `drop` for high-severity rules
6. Click **Download & Update Rules**, then **Apply**
> ✓ Suricata complements CrowdSec well. CrowdSec handles IP reputation; Suricata handles traffic content inspection. They do not overlap.
### Priority 3 — Additional Blocklists
Add these URL table aliases to supplement Spamhaus DROP/EDROP:
| List | URL | Purpose |
|---|---|---|
| Feodo Tracker | https://feodotracker.abuse.ch/downloads/ipblocklist.txt | Botnet C2 IPs |
| Abuse.ch SSLBL | https://sslbl.abuse.ch/blacklist/sslipblacklist.txt | Malicious SSL IPs |
| Emerging Threats | https://rules.emergingthreats.net/fwrules/emerging-Block-IPs.txt | ET block list |
For each: **Firewall → Aliases → Add**, type `URL Table`, set refresh to 1 day. Then add a WAN block rule using each alias as the source.
### Priority 4 — dnscrypt-proxy (Encrypted DNS)
Encrypts DNS queries leaving the firewall and adds DNS-level malware/tracking blocklists.
1. Go to **System → Firmware → Plugins**, install `os-dnscrypt-proxy`
2. Navigate to **Services → DNSCrypt-Proxy**
3. Enable, set listen port to `5353`
4. Select resolvers: `cloudflare`, `quad9-dnscrypt-ip4-nofilter-pri` (or similar)
5. Enable DNSSEC validation
6. Update **System → Settings → General** — set DNS server to `127.0.0.1:5353`
7. Disable `DNS Allow Override` so the ISP cannot push DNS changes
### Priority 5 — os-git-backup
Automatically commits every OPNsense config change to a Git repository. Invaluable for auditing changes after an incident and for rapid recovery.
1. Go to **System → Firmware → Plugins**, install `os-git-backup`
2. Navigate to **System → Configuration → Git Backup**
3. Configure a Forgejo repository on Netgrimoire as the remote
4. Set SSH key for authentication
5. Enable automatic backup on config change
### Priority 6 — Syslog to Graylog
Syslog is not currently configured. Sending firewall logs to Graylog (already running at `http://graylog:9000`) enables centralized log analysis and alerting.
1. Go to **System → Settings → Logging → Remote**
2. Add a syslog destination: `graylog:514` (UDP) or use GELF input on Graylog
3. Enable logging for: Firewall, DHCP, VPN, Authentication, CrowdSec
---
## Known Issues & Action Items
| Item | Priority | Notes |
|---|---|---|
| Spamhaus DROP rule disabled | 🔴 High | Re-enable in Firewall → Rules → WAN |
| Spamhaus EDROP rule disabled | 🔴 High | Re-enable in Firewall → Rules → WAN |
| GeoIP block rule disabled | 🔴 High | Re-enable in Firewall → Rules → WAN |
| Suricata not configured | 🔴 High | Most significant security gap — configure with ET Open rules |
| Duplicate CrowdSec aliases | 🟡 Medium | crowdsec_blacklists and crowdsec_blocklists both do IPv4 — consolidate |
| WireGuard rule too permissive | 🟡 Medium | Allow-all from peers — scope per peer when needs are known |
| OPT3 interface unassigned | 🟡 Medium | Disable or assign a role |
| System DNS points to Google | 🟡 Medium | Should point to internal resolver or localhost after dnscrypt-proxy setup |
| No syslog configured | 🟡 Medium | Forward to Graylog for centralized logging |
| os-git-backup not installed | 🟡 Medium | Install for config change auditing |
| OpenVPN config unpopulated | 🟢 Low | Verify status — backup shows server+client but no details |
| ATT migration incomplete | 🟢 Low | In progress — see migration plan above |
| Family_Subnet alias empty | 🟢 Low | Populate or remove |
| Plex_Port_2 alias empty | 🟢 Low | Populate or remove |
| DHCP disabled on LAN | 🟢 Info | Intentional if using static IPs — verify |
---
## Related Documentation
- [Caddy Reverse Proxy](./caddy-reverse-proxy) — services exposed through the firewall
- [MailCow Mail Server](./mailcow) — mail server behind the firewall, dedicated WAN IP
- [WireGuard VPN](./wireguard) — peer configuration and access
- [Graylog](./graylog) — target for firewall syslog
- [CrowdSec](./crowdsec) — threat intelligence integration

View file

@ -0,0 +1,212 @@
---
title: OpnSense-IDS/IPS
description: IDS
published: true
date: 2026-02-23T21:51:49.920Z
tags:
editor: markdown
dateCreated: 2026-02-23T21:49:16.861Z
---
# Suricata IDS/IPS
**Service:** Suricata Intrusion Detection & Prevention System
**Host:** OPNsense firewall
**Interfaces:** ATT (opt1) — add WAN (igc0) while still active
**Mode:** IPS (inline blocking)
**Rulesets:** ET Open, Feodo Tracker, Abuse.ch SSL
---
## Overview
Suricata is OPNsense's built-in deep packet inspection engine. Unlike CrowdSec (which blocks based on IP reputation) and GeoIP (which blocks by country), Suricata inspects the **content** of traffic — detecting exploit patterns, malware C2 communication, vulnerability scans, and known CVE exploitation attempts in real time.
The two systems complement each other and do not overlap:
| Layer | Tool | What It Stops |
|---|---|---|
| IP reputation | CrowdSec | Known bad IPs from community threat intel |
| Geography | GeoIP | Traffic from blocked countries |
| Content inspection | Suricata | Malicious payloads, exploit patterns, C2 traffic |
Suricata uses **Netmap** for high-performance inline packet processing with minimal CPU overhead.
> ⚠ **Before enabling IPS mode:** Disable hardware offloading on your interfaces or Netmap will not function correctly. This is done in **Interfaces → Settings**.
---
## Pre-requisite: Disable Hardware Offloading
1. Go to **Interfaces → Settings**
2. Disable the following options:
- Hardware CRC
- Hardware TSO
- Hardware LRO
- VLAN Hardware Filtering
3. Click **Save**
4. Reboot the firewall
> ✓ This is a one-time change. It has no meaningful impact on performance for home/small business use and is required for Suricata IPS mode to function.
---
## Installation
Suricata is built into OPNsense — no plugin install required. Navigate directly to:
**Services → Intrusion Detection → Administration**
---
## Configuration
### Step 1 — General Settings
Navigate to **Services → Intrusion Detection → Administration**
| Setting | Value | Notes |
|---|---|---|
| Enabled | ✓ | Turns on the IDS/IPS engine |
| IPS Mode | ✓ | Enables inline blocking (not just alerting) |
| Promiscuous Mode | Leave default | Only needed for mirrored traffic setups |
| Default Packet Size | Leave default | Auto-detected |
| Interfaces | ATT, WAN | Add both while dual-WAN is active; remove WAN after migration |
| Home Networks | 192.168.3.0/24, 192.168.5.0/24, 192.168.32.0/24 | Your internal subnets — critical for rule accuracy |
| Log Level | Info | |
| Log Retention | 7 days | Adjust based on disk space |
> ⚠ **Home Networks is critical.** Suricata rules use `$HOME_NET` and `$EXTERNAL_NET` to determine direction. If your internal subnets are not listed here, many rules will fail to trigger correctly or will produce false positives.
Click **Apply** after setting these values.
### Step 2 — Download Rulesets
Navigate to **Services → Intrusion Detection → Download**
Enable the following rulesets:
| Ruleset | Provider | Priority | Notes |
|---|---|---|---|
| ET Open | Proofpoint Emerging Threats | 🔴 Essential | Comprehensive free ruleset — 40,000+ rules covering exploits, malware, scanning, C2 |
| Abuse.ch SSL Blacklist | Abuse.ch | 🔴 Essential | Blocks connections to malicious SSL certificates used by malware |
| Feodo Tracker Botnet | Abuse.ch | 🔴 Essential | Blocks botnet C2 IP communication |
| OSIF | OPNsense | 🟡 Recommended | OPNsense internal feed |
| PT Research | Positive Technologies | 🟡 Recommended | Additional threat intelligence |
To enable each ruleset:
1. Find it in the list
2. Toggle the **Enabled** switch
3. Click **Download & Update Rules** at the top of the page
> ✓ ET Open is the most important ruleset. It is maintained by Proofpoint, updated daily, and covers the vast majority of common attack patterns you will encounter.
### Step 3 — Configure Policies
Policies control what Suricata does when a rule matches — alert only, or drop the packet.
Navigate to **Services → Intrusion Detection → Policy**
**Recommended policy setup:**
Add the following policies in order:
**Policy 1 — Drop high-severity ET threats**
| Field | Value |
|---|---|
| Description | Drop ET High Severity |
| Priority | 1 |
| Rulesets | ET Open |
| Action | Drop |
| Severity | ≥ High |
**Policy 2 — Alert on medium-severity (tuning period)**
| Field | Value |
|---|---|
| Description | Alert ET Medium |
| Priority | 2 |
| Rulesets | ET Open |
| Action | Alert |
| Severity | Medium |
**Policy 3 — Drop all Feodo/Abuse.ch matches**
| Field | Value |
|---|---|
| Description | Drop Botnet C2 and SSL Blacklist |
| Priority | 1 |
| Rulesets | Feodo Tracker, Abuse.ch SSL |
| Action | Drop |
| Severity | Any |
> ✓ Start with medium-severity rules in **alert** mode for the first 12 weeks. Review alerts in the log for false positives before switching to drop. High-severity rules and the abuse.ch lists are safe to drop immediately.
### Step 4 — Apply and Verify
1. Click **Apply** on the Administration tab
2. Navigate to **Services → Intrusion Detection → Alerts**
3. Wait a few minutes — alerts should begin populating
4. Check **Services → Intrusion Detection → Stats** to confirm traffic is being processed
---
## Tuning & False Positives
After running in alert mode for a week, review the Alerts tab. Common false positives from home lab environments include:
- **Nextcloud sync traffic** — may trigger file transfer rules
- **Torrents/P2P** — will trigger multiple ET rules by design
- **Internal port scanning tools** — Nmap from internal hosts triggers scan rules
To suppress a false positive rule without disabling it entirely:
1. Note the rule SID from the alert
2. Go to **Services → Intrusion Detection → Rules**
3. Search for the SID
4. Change the rule action to **Alert** (instead of Drop) for that specific rule
Alternatively, add a suppression in **Services → Intrusion Detection → Suppressions**:
- Enter the SID
- Set the direction (source or destination)
- Enter the IP to suppress for that rule
---
## Monitoring
### Alert Dashboard
**Services → Intrusion Detection → Alerts** — real-time view of matched rules.
Useful filters:
- Filter by `severity: high` to see the most critical events
- Filter by `action: drop` to see what is being actively blocked
- Filter by source IP to investigate a specific host
### Graylog Integration
Forward Suricata alerts to Graylog for centralized analysis:
1. Suricata logs to `/var/log/suricata/eve.json` in EVE JSON format
2. In Graylog, add a **Beats input** or **Syslog UDP input**
3. In OPNsense **System → Settings → Logging → Remote**, add Graylog as syslog target
4. Create a Graylog stream filtering on `application_name: suricata`
---
## Key Files & Paths
| Path | Purpose |
|---|---|
| `/var/log/suricata/eve.json` | EVE JSON alert log — used by Graylog |
| `/var/log/suricata/stats.log` | Performance statistics |
| `/usr/local/etc/suricata/suricata.yaml` | Main config (managed by OPNsense UI) |
| `/usr/local/share/suricata/rules/` | Downloaded rulesets |
---
## Related Documentation
- [OPNsense Firewall](./opnsense-firewall) — parent firewall documentation
- [CrowdSec](./crowdsec) — complementary IP reputation layer
- [Additional Blocklists](./opnsense-blocklists) — Feodo, Abuse.ch, ET IP blocklists at firewall level
- [Graylog](./graylog) — centralized log target for Suricata alerts

View file

@ -0,0 +1,159 @@
---
title: OpnSense - App Protection
description: App Inspection
published: true
date: 2026-02-23T21:52:43.630Z
tags:
editor: markdown
dateCreated: 2026-02-23T21:50:37.324Z
---
# Zenarmor (NGFW)
**Service:** Zenarmor Next-Generation Firewall
**Plugin:** os-sunnyvalley
**Tier:** Free Edition
**Host:** OPNsense firewall
---
## Overview
Zenarmor adds application-layer awareness and web filtering to OPNsense that the base firewall does not provide. Where Suricata inspects packet content for known threat signatures, Zenarmor identifies **what application or service** is generating traffic and can block or allow based on that — regardless of port.
| Feature | Free Tier | Paid Tier |
|---|---|---|
| Layer-7 app identification | ✓ | ✓ |
| Web category filtering | Default policy only | Custom policies |
| Malware/phishing blocking | ✓ | ✓ |
| Real-time network analytics | ✓ | ✓ |
| Device tracking & alerts | ✗ | ✓ |
| Multiple policies | ✗ | ✓ |
| TLS inspection | ✗ | ✓ |
The free tier is useful primarily for **visibility** (seeing what applications are running on your network) and **basic threat blocking** (malware, phishing, PUP domains). The analytics dashboard alone makes it worthwhile.
> ✓ Zenarmor and Suricata can run simultaneously. They operate at different layers and do not conflict. Zenarmor handles application identity; Suricata handles content signatures.
> ⚠ **MongoDB deprecation note:** As of September 2025, MongoDB is being deprecated as the Zenarmor database backend. Use **SQLite** when prompted during setup — it is the supported path going forward.
---
## Installation
### Step 1 — Install the Plugin
1. Go to **System → Firmware → Plugins**
2. Search for `os-sunnyvalley`
3. Click the **+** install button
4. Wait for installation to complete
5. **Refresh the browser** — a new **Zenarmor** menu item will appear in the sidebar
### Step 2 — Initial Setup Wizard
Navigate to **Zenarmor → Dashboard** — this launches the setup wizard on first run.
**Deployment Mode:** Select **Routed Mode (L3)** for standard OPNsense setups. This is correct for your configuration.
**Database:** Select **SQLite** — do not select MongoDB (deprecated September 2025).
**Interface:** Select **ATT (opt1)** as the primary interface. Add **WAN (igc0)** while dual-WAN is still active.
> ⚠ Zenarmor should be applied to the **LAN-facing side** of the firewall for internal traffic inspection, or the **WAN-facing side** for inbound threat blocking. For your setup, applying it to both ATT and LAN gives the most coverage.
**Cloud Connectivity:** Leave enabled — Zenarmor uses cloud-based category lookups for web filtering. If you want fully offline operation, this can be disabled but web filtering accuracy degrades significantly.
Click **Complete** to finish the wizard.
---
## Configuration
### Step 3 — Security Policy
Navigate to **Zenarmor → Security**
Enable the following threat categories in the default policy:
| Category | Action | Notes |
|---|---|---|
| Malware | Block | Domains known to serve malware |
| Phishing | Block | Credential harvesting sites |
| Botnet | Block | C2 communication |
| PUP/Adware | Block | Potentially unwanted programs |
| SPAM Sources | Block | Known spam infrastructure |
| Parked Domains | Block | Often used for malicious redirects |
Leave the following as **Alert** initially (review before blocking):
- Anonymizers / Proxies — may block legitimate VPN services
- Peer-to-peer — may affect legitimate use cases
### Step 4 — Application Control
Navigate to **Zenarmor → Policies → Application Control**
The free tier allows one default policy. Useful applications to consider blocking or monitoring:
| Application Category | Recommendation | Reason |
|---|---|---|
| Cryptocurrency mining | Block | Resource theft if unauthorized |
| Remote access tools (unknown) | Alert | Unexpected remote tools are a red flag |
| Tor | Alert | Monitor — may be legitimate or evasion |
| Anonymous proxies | Block | Bypass attempts |
### Step 5 — Web Filtering
Navigate to **Zenarmor → Policies → Web Controls**
In the free tier, the default policy controls all web filtering. Recommended categories to block:
| Category | Action |
|---|---|
| Malware sites | Block |
| Phishing | Block |
| Hacking / exploit sites | Block |
| Illegal content | Block |
Enable **Safe Search enforcement** if desired — forces Google, Bing, and YouTube into safe search mode network-wide.
---
## Dashboard & Analytics
Navigate to **Zenarmor → Dashboard**
The dashboard provides real-time visibility into:
- **Top talkers** — which internal hosts generate the most traffic
- **Top applications** — what services are being used
- **Blocked threats** — real-time feed of blocked requests
- **Bandwidth usage** — per-host and per-application
This is the primary value of the free tier — even without advanced policy control, the visibility into what is running on your network is significant.
Navigate to **Zenarmor → Reports** for historical analysis and trend data.
---
## Performance Notes
Zenarmor uses deep packet inspection which adds some CPU overhead. On modern hardware (anything with i226-V NICs) this is negligible at home lab traffic volumes. Monitor CPU usage in **Zenarmor → Dashboard → System** after enabling.
If performance degrades, you can limit Zenarmor to specific interfaces rather than all interfaces.
---
## Known Limitations (Free Tier)
- Only one web filtering policy — all devices get the same rules
- No per-device or per-group policies
- No TLS/SSL inspection — encrypted traffic is identified by SNI only
- No device inventory or unknown device alerts
- Web category database is cloud-dependent
---
## Related Documentation
- [OPNsense Firewall](./opnsense-firewall) — parent firewall documentation
- [Suricata IDS/IPS](./suricata-ids-ips) — complementary content inspection layer
- [CrowdSec](./crowdsec) — IP reputation layer

View file

@ -0,0 +1,31 @@
---
title: Alert Routing
description: How security alerts flow through Netgrimoire
published: true
date: 2026-04-12T00:00:00.000Z
tags: ward, alerts, ntfy
editor: markdown
dateCreated: 2026-04-12T00:00:00.000Z
---
# Alert Routing
All Netgrimoire alerts route through self-hosted ntfy at `ntfy.netgrimoire.com`.
## ntfy Topics
| Topic | Source | Purpose |
|-------|--------|---------|
| `netgrimoire-diun` | DIUN | Docker image update notifications |
| `netgrimoire-media` | Sonarr, Radarr, SABnzbd | Download and media events |
| `netgrimoire-backup` | Kopia | Backup completion and errors |
| `gremlin-alerts` | n8n Kuma triage workflow | AI-analyzed service DOWN alerts |
| `gremlin-audits` | n8n Forgejo audit workflow | Weekly YAML audit summaries |
## Alert Sources
**OPNsense → ntfy:** CrowdSec HTTP plugin (`/usr/local/etc/crowdsec/notifications/ntfy.yaml`) + Monit script (`/usr/local/bin/ntfy-alert.sh`). See [OPNsense Alerts](/Ward-Grimoire/Notifications/OPNsense-Alerts).
**Uptime Kuma → Gremlin → ntfy:** Kuma webhook fires on DOWN/RECOVERED → n8n triage workflow → Ollama analysis (DOWN path only) → ntfy `gremlin-alerts`. See [Gremlin Kuma Triage](/Gremlin-Grimoire/Workflows/Kuma-Triage).
**DIUN → ntfy:** Docker image update watcher. Schedule: every 6 hours. Priority must be integer (15), not string `"default"`.

View file

@ -0,0 +1,463 @@
---
title: OpnSense - NTFY Integration
description: Security Notifications
published: true
date: 2026-02-23T22:00:46.462Z
tags:
editor: markdown
dateCreated: 2026-02-23T22:00:37.268Z
---
# OPNsense ntfy Alerts
**Service:** ntfy push notifications from OPNsense
**Host:** OPNsense firewall
**ntfy Server:** Your self-hosted ntfy instance on Netgrimoire
**Methods:** CrowdSec HTTP plugin · Monit custom script · Suricata EVE watcher
---
## Overview
OPNsense does not have a built-in ntfy notification channel, but there are three distinct integration points that together provide complete coverage:
| Method | What It Alerts On | Priority |
|---|---|---|
| **CrowdSec HTTP plugin** | Every IP ban decision CrowdSec makes | 🔴 Best for threat intel alerts |
| **Monit + curl script** | System health, service failures, Suricata EVE matches, login failures | 🔴 Best for operational alerts |
| **Suricata EVE watcher** | Suricata high-severity IDS hits (via Monit watching eve.json) | 🟡 Covered via Monit |
All three use your self-hosted ntfy instance. None require external services.
---
## Prerequisites
Before starting, confirm:
- ntfy is running and reachable at `https://ntfy.netgrimoire.com` (or your internal URL)
- ntfy topic created: e.g. `opnsense-alerts`
- If ntfy has auth enabled, have a token ready
- SSH access to OPNsense as root
---
## Method 1 — CrowdSec HTTP Notification Plugin
This is the cleanest integration for security alerts. CrowdSec has a built-in HTTP notification plugin. Every time it makes a ban decision — whether from community intel, a Suricata match passed through CrowdSec, or a brute-force detection — it POSTs to ntfy.
### Step 1 — Create the HTTP notification config
SSH into OPNsense and create the ntfy config file:
```bash
ssh root@192.168.3.4
```
```bash
cat > /usr/local/etc/crowdsec/notifications/ntfy.yaml << 'EOF'
# ntfy notification plugin for CrowdSec
# CrowdSec uses its built-in HTTP plugin pointed at ntfy
type: http
name: ntfy_default
log_level: info
# ntfy accepts plain POST body as the notification message
# format is a Go template — .[]Alert is the list of alerts
format: |
{{range .}}
🚨 CrowdSec Decision
Scenario: {{.Scenario}}
Attacker IP: {{.Source.IP}}
Country: {{.Source.Cn}}
Action: {{.Decisions | len}} x {{(index .Decisions 0).Type}}
Duration: {{(index .Decisions 0).Duration}}
{{end}}
url: https://ntfy.netgrimoire.com/opnsense-alerts
method: POST
headers:
Title: "CrowdSec Ban — OPNsense"
Priority: "high"
Tags: "rotating_light,shield"
# Uncomment and set token if ntfy auth is enabled:
# Authorization: "Bearer YOUR_NTFY_TOKEN"
# skip_tls_verify: false
EOF
```
> ⚠ Replace `https://ntfy.netgrimoire.com/opnsense-alerts` with your actual ntfy URL and topic. If ntfy is internal-only and OPNsense can reach it by hostname, the internal URL works fine.
### Step 2 — Register the plugin in profiles.yaml
Edit the CrowdSec profiles file to dispatch decisions to the ntfy plugin:
```bash
vi /usr/local/etc/crowdsec/profiles.yaml
```
Find the `notifications:` section of the default profile and add `ntfy_default`:
```yaml
name: default_ip_remediation
filters:
- Alert.Remediation == true && Alert.GetScope() == "Ip"
decisions:
- type: ban
duration: 4h
notifications:
- ntfy_default # ← add this line
on_success: break
```
> ✓ The `ntfy_default` name must match the `name:` field in the yaml file you created above exactly.
### Step 3 — Set correct file ownership
CrowdSec rejects plugins if the configuration file is not owned by the root user and root group. Ensure the file has the right permissions:
```bash
chown root:wheel /usr/local/etc/crowdsec/notifications/ntfy.yaml
chmod 600 /usr/local/etc/crowdsec/notifications/ntfy.yaml
```
### Step 4 — Restart CrowdSec and test
```bash
# Restart via OPNsense service manager (do NOT use systemctl/service directly)
# Go to: Services → CrowdSec → Settings → Apply
# Or from shell:
pluginctl -s crowdsec restart
```
Test by sending a manual notification:
```bash
cscli notifications test ntfy_default
```
You should receive a test push on your device within a few seconds.
Then trigger a real decision to verify the full pipeline:
```bash
# Ban your own IP for 2 minutes as a test (replace with your IP)
cscli decisions add -t ban -d 2m -i 1.2.3.4
# Watch for ntfy notification
# Remove the test ban:
cscli decisions delete -i 1.2.3.4
```
---
## Method 2 — Monit + curl Script
Monit is OPNsense's built-in service monitor. It can watch processes, files, system resources, and log patterns — and call a custom shell script when a condition is met. The script fires a curl POST to ntfy.
This covers things CrowdSec doesn't — service failures, high CPU, gateway down events, SSH login failures, disk usage, and Suricata EVE alerts.
### Step 2.1 — Create the ntfy alert script
```bash
cat > /usr/local/bin/ntfy-alert.sh << 'EOF'
#!/usr/local/bin/bash
# ntfy-alert.sh — called by Monit to send ntfy push notifications
# Monit provides variables: $MONIT_HOST, $MONIT_SERVICE,
# $MONIT_DESCRIPTION, $MONIT_EVENT
NTFY_URL="https://ntfy.netgrimoire.com/opnsense-alerts"
# NTFY_TOKEN="Bearer YOUR_NTFY_TOKEN" # uncomment if ntfy auth enabled
TITLE="${MONIT_HOST}: ${MONIT_SERVICE}"
MESSAGE="${MONIT_EVENT} — ${MONIT_DESCRIPTION}"
# Map Monit event types to ntfy priorities
case "$MONIT_EVENT" in
*"does not exist"*|*"failed"*|*"error"*)
PRIORITY="urgent"
TAGS="rotating_light,red_circle"
;;
*"changed"*|*"match"*)
PRIORITY="high"
TAGS="warning,yellow_circle"
;;
*"recovered"*|*"succeeded"*)
PRIORITY="default"
TAGS="white_check_mark,green_circle"
;;
*)
PRIORITY="default"
TAGS="bell"
;;
esac
curl -s \
-H "Title: ${TITLE}" \
-H "Priority: ${PRIORITY}" \
-H "Tags: ${TAGS}" \
-d "${MESSAGE}" \
"${NTFY_URL}"
# Uncomment for auth:
# curl -s \
# -H "Authorization: ${NTFY_TOKEN}" \
# -H "Title: ${TITLE}" \
# -H "Priority: ${PRIORITY}" \
# -H "Tags: ${TAGS}" \
# -d "${MESSAGE}" \
# "${NTFY_URL}"
EOF
chmod +x /usr/local/bin/ntfy-alert.sh
```
### Step 2.2 — Enable Monit
Navigate to **Services → Monit → Settings → General Settings**
| Setting | Value |
|---|---|
| Enabled | ✓ |
| Polling Interval | 30 seconds |
| Start Delay | 120 seconds |
| Mail Server | Leave blank (using script instead) |
Click **Save**.
### Step 2.3 — Add Service Tests
Navigate to **Services → Monit → Service Tests Settings** and add the following tests:
**Test 1 — Custom Alert via Script**
| Field | Value |
|---|---|
| Name | `ntfy_alert` |
| Condition | `failed` |
| Action | Execute |
| Path | `/usr/local/bin/ntfy-alert.sh` |
This is the reusable action that all other tests will invoke.
**Test 2 — Suricata EVE High Alert**
| Field | Value |
|---|---|
| Name | `SuricataHighAlert` |
| Condition | `content = "\"severity\":1"` |
| Action | Execute → `/usr/local/bin/ntfy-alert.sh` |
This watches for severity 1 (highest) alerts written to the Suricata EVE JSON log.
**Test 3 — Suricata Process Down**
| Field | Value |
|---|---|
| Name | `SuricataRunning` |
| Condition | `failed` |
| Action | Execute → `/usr/local/bin/ntfy-alert.sh` |
**Test 4 — CrowdSec Process Down**
| Field | Value |
|---|---|
| Name | `CrowdSecRunning` |
| Condition | `failed` |
| Action | Execute → `/usr/local/bin/ntfy-alert.sh` |
**Test 5 — SSH Login Failure**
| Field | Value |
|---|---|
| Name | `SSHFailedLogin` |
| Condition | `content = "Failed password"` |
| Action | Execute → `/usr/local/bin/ntfy-alert.sh` |
**Test 6 — OPNsense Web UI Login Failure**
| Field | Value |
|---|---|
| Name | `WebUILoginFail` |
| Condition | `content = "webgui"` |
| Action | Execute → `/usr/local/bin/ntfy-alert.sh` |
### Step 2.4 — Add Service Monitors
Navigate to **Services → Monit → Service Settings** and add:
**Monitor 1 — Suricata EVE Log (high alerts)**
| Field | Value |
|---|---|
| Name | `SuricataEVE` |
| Type | File |
| Path | `/var/log/suricata/eve.json` |
| Tests | `SuricataHighAlert` |
**Monitor 2 — Suricata Process**
| Field | Value |
|---|---|
| Name | `Suricata` |
| Type | Process |
| PID File | `/var/run/suricata.pid` |
| Tests | `SuricataRunning` |
| Restart Method | /usr/local/etc/rc.d/suricata restart |
**Monitor 3 — CrowdSec Process**
| Field | Value |
|---|---|
| Name | `CrowdSec` |
| Type | Process |
| Match | `crowdsec` |
| Tests | `CrowdSecRunning` |
**Monitor 4 — SSH Auth Log**
| Field | Value |
|---|---|
| Name | `SSHAuth` |
| Type | File |
| Path | `/var/log/auth.log` |
| Tests | `SSHFailedLogin` |
**Monitor 5 — System Resources (optional)**
| Field | Value |
|---|---|
| Name | `System` |
| Type | System |
| Tests | `ntfy_alert` (on resource threshold exceeded) |
Click **Apply** after adding all services.
### Step 2.5 — Test Monit alerts
```bash
# Manually invoke the script to test ntfy connectivity
MONIT_HOST="OPNsense" \
MONIT_SERVICE="Test" \
MONIT_EVENT="Test alert" \
MONIT_DESCRIPTION="Testing ntfy integration from Monit" \
/usr/local/bin/ntfy-alert.sh
```
You should receive a push notification immediately.
---
## Alert Topics & Priority Mapping
Consider using separate ntfy topics to filter notifications by type on your device:
| Topic | Used For | Suggested ntfy Priority |
|---|---|---|
| `opnsense-alerts` | CrowdSec bans, Suricata high hits | high / urgent |
| `opnsense-health` | Monit service failures, process restarts | high |
| `opnsense-info` | Service recoveries, status changes | default / low |
To use separate topics, change the `NTFY_URL` in the Monit script and the `url:` in the CrowdSec config accordingly.
---
## ntfy Priority Reference
ntfy supports five priority levels that map to different notification behaviors on Android/iOS:
| ntfy Priority | Numeric | Behavior |
|---|---|---|
| `min` | 1 | No notification, no sound |
| `low` | 2 | Notification, no sound |
| `default` | 3 | Notification with sound |
| `high` | 4 | Notification with sound, bypasses DND |
| `urgent` | 5 | Phone rings through DND, repeated |
For firewall alerts: use `urgent` for process failures and `high` for IDS/ban events. Reserve `urgent` sparingly to avoid alert fatigue.
---
## Keeping Config Persistent Across Upgrades
OPNsense upgrades can overwrite files in certain paths. The safest locations for persistent custom files:
| File | Location | Persistent? |
|---|---|---|
| ntfy-alert.sh | `/usr/local/bin/ntfy-alert.sh` | ✓ Yes — not touched by upgrades |
| CrowdSec ntfy.yaml | `/usr/local/etc/crowdsec/notifications/ntfy.yaml` | ✓ Yes — plugin config directory |
| CrowdSec profiles.yaml | `/usr/local/etc/crowdsec/profiles.yaml` | ⚠ Re-check after CrowdSec updates |
After any OPNsense or CrowdSec update, verify:
```bash
# Check CrowdSec notification config is still present
ls -la /usr/local/etc/crowdsec/notifications/
# Test CrowdSec ntfy still works
cscli notifications test ntfy_default
# Check Monit script is still executable
ls -la /usr/local/bin/ntfy-alert.sh
```
---
## Troubleshooting
**No notification received from CrowdSec test:**
```bash
# Check CrowdSec logs for plugin errors
tail -50 /var/log/crowdsec.log | grep -i ntfy
tail -50 /var/log/crowdsec.log | grep -i notification
# Verify ntfy URL is reachable from OPNsense
curl -v -d "test" https://ntfy.netgrimoire.com/opnsense-alerts
# Check profiles.yaml has ntfy_default in notifications section
grep -A5 "notifications:" /usr/local/etc/crowdsec/profiles.yaml
```
**No notification received from Monit:**
```bash
# Run the script manually with test variables
MONIT_HOST="test" MONIT_SERVICE="test" \
MONIT_EVENT="test" MONIT_DESCRIPTION="test message" \
/usr/local/bin/ntfy-alert.sh
# Check Monit is running
ps aux | grep monit
# Check Monit logs
tail -50 /var/log/monit.log
```
**CrowdSec plugin ownership error:**
```bash
# Fix ownership if CrowdSec refuses to load the plugin
chown root:wheel /usr/local/etc/crowdsec/notifications/ntfy.yaml
ls -la /usr/local/etc/crowdsec/notifications/
```
**ntfy auth failing:**
```bash
# Test with token manually
curl -H "Authorization: Bearer YOUR_TOKEN" \
-H "Title: Test" \
-d "Auth test" \
https://ntfy.netgrimoire.com/opnsense-alerts
```
---
## Related Documentation
- [OPNsense Firewall](./opnsense-firewall) — parent firewall documentation
- [CrowdSec](./crowdsec) — threat intelligence engine sending these alerts
- [Suricata IDS/IPS](./suricata-ids-ips) — source of EVE alerts watched by Monit
- [ntfy](./ntfy) — self-hosted notification server on Netgrimoire

View file

@ -0,0 +1,122 @@
# ntfy
## Overview
The ntfy stack is a Docker Swarm-based service that provides push notifications in NetGrimoire. It consists of two services: ntfy, which runs the ntfy binary, and another service for reverse proxying and monitoring.
---
## Architecture
| Service | Image | Port | Role |
|---------|-------|------|-----|
- **ntfy:** binwiederhier/ntfy | - | 81:80 | Push Notifications |
- **Caddy (reverse proxy):** ntfy.netgrimoire.com | Internal only | N/A | Reverse Proxy |
- **Homepage group:** Services |
---
## Build & Configuration
### Prerequisites
No specific prerequisites are required for this stack.
### Volume Setup
```bash
mkdir -p /DockerVol/ntfy/cache
mkdir -p /DockerVol/ntfy/etc
chown -R ntfy:ntfy /DockerVol/ntfy
```
### Environment Variables
```bash
generate: openssl rand -hex 32
```
### Deploy
```bash
cd services/swarm/stack/ntfy
set -a && source .env && set +a
docker stack config --compose-file ntfy.yaml > resolved.yml
docker stack deploy --compose-file resolved.yml ntfy
rm resolved.yml
docker stack services ntfy
```
### First Run
No specific steps are required for the first run.
---
## User Guide
### Accessing ntfy
| Service | URL | Purpose |
|---------|-----|---------|
- **ntfy:** https://ntfy.netgrimoire.com (Internal only) |
### Primary Use Cases
The primary use case is to receive push notifications in NetGrimoire.
### NetGrimoire Integrations
The ntfy service connects to other services through environment variables and labels.
---
## Operations
### Monitoring
[kuma.ntfy.http.name: ntfy, kuma.ntfy.http.url: https://ntfy.netgrimoire.com]
```bash
docker stack services ntfy
docker service logs -f ntfy | grep "NTFY"
```
### Backups
Critical data is stored in /DockerVol/ntfy/cache.
### Restore
```bash
cd services/swarm/stack/ntfy
./deploy.sh
```
---
## Common Failures
1. **Symptom:** Push notifications are not received.
**Cause:** Missing Caddy configuration or environment variables.
**Fix:** Check Caddy labels and environment variables for correctness.
2. **Symptom:** ntfy service is down.
**Cause:** Insufficient restart policy.
**Fix:** Adjust the restart policy in the deploy section.
3. **Symptom:** Docker stack services are not running.
**Cause:** Missing docker-compose-file.
**Fix:** Check if ntfy-stack.yml exists.
4. **Symptom:** Logs do not show any errors.
**Cause:** Insufficient logging configuration.
**Fix:** Adjust log levels or increase verbosity in logs.
5. **Symptom:** Environment variables are incorrect.
**Cause:** Incorrect source of environment variables.
**Fix:** Verify that .env file is correctly sourced.
---
## Changelog
| Date | Commit | Summary |
|------|--------|---------|
- 2026-04-07 | 5058dbe5 | Initial documentation for ntfy stack. |
- 2026-04-07 | 247956f0 | Fixed minor issues in deploy and user guide sections. |
- 2026-02-01 | 85da4a27 | Changed volume paths to match /DockerVol/. |
- 2026-02-01 | 9da20931 | Adjusted logging configuration for ntfy service. |
- 2026-01-10 | 1a374911 | Added initial documentation. |
---
## Notes
- Generated by Gremlin on 2026-04-07T19:16:54.993Z
- Source: swarm/ntfy.yaml
- Review User Guide and Changelog sections

View file

@ -0,0 +1,54 @@
---
title: Ward Grimoire
description: Security — the gargoyle sentinel watches the gates
published: true
date: 2026-04-12T00:00:00.000Z
tags: ward, security
editor: markdown
dateCreated: 2026-04-12T00:00:00.000Z
---
# Ward Grimoire
![ward-badge](/images/ward-badge.png)
The Ward Grimoire covers all security enforcement, access control, and threat response for Netgrimoire. The gargoyle sees everything that tries to come through.
---
## Sections
| Section | Contents |
|---------|----------|
| [Firewall](/Ward-Grimoire/Firewall/OPNsense) | OPNsense dual-WAN, NAT, static IPs, Suricata IDS, Zenarmor, blocklists, GeoIP |
| [Access](/Ward-Grimoire/Access/Auth-Overview) | Authentik (SSO), Authelia (wasted-bandwidth), LLDAP, Vaultwarden, YubiKey, WireGuard |
| [Notifications](/Ward-Grimoire/Notifications/Alert-Routing) | ntfy, CrowdSec alerts, OPNsense Monit, alert routing |
---
## Security Stack Status
| Component | Status | Notes |
|-----------|--------|-------|
| OPNsense firewall | ✅ Active | Dual-WAN, ATT primary |
| CrowdSec (OPNsense bouncer) | ✅ Active | Perimeter blocking |
| CrowdSec (Caddy bouncer) | 🔧 In progress | Gradual per-service rollout |
| Authentik | ✅ Active | SSO for `*.netgrimoire.com` |
| Authelia | ✅ Active | SSO for `*.wasted-bandwidth.net` |
| LLDAP | ✅ Active | LDAP directory backend |
| Vaultwarden | ✅ Active | `pass.netgrimoire.com` |
| WireGuard | ✅ Active | 5 peers, 192.168.32.0/24 |
| Suricata IDS/IPS | 📋 Pending | OPNsense plugin, config not started |
| Zenarmor | 📋 Pending | Free tier, not installed |
| dnscrypt-proxy | 📋 Pending | Encrypted upstream DNS |
| os-git-backup | 📋 Pending | OPNsense config → Forgejo |
| Spamhaus + GeoIP rules | 🔧 Broken | Currently disabled — needs fixing |
| YubiKey PIV (SSH) | 📋 Planned | High-impact, not started |
---
## Key Principles
- **Fail open** — CrowdSec Caddy bouncer is configured to fail open. If CrowdSec is unreachable, Caddy continues serving. Sites stay up, enforcement suspends temporarily. Do not change to `enable_hard_fails true` in a homelab.
- **Layered defense** — OPNsense blocks at the perimeter, CrowdSec blocks at the HTTP layer, Authentik/Authelia control application access.
- **Never disable Spamhaus permanently** — the GeoIP and Spamhaus rules were disabled during troubleshooting and need to be re-enabled and tested.

View file

@ -0,0 +1,90 @@
---
title: Homepage Dashboard
description: Homepage configuration — tabs, groups, widgets, API keys
published: true
date: 2026-04-12T00:00:00.000Z
tags: watch, homepage, dashboard
editor: markdown
dateCreated: 2026-04-12T00:00:00.000Z
---
# Homepage Dashboard
Homepage runs at `homepage.netgrimoire.com`, port 3056:3000. Config lives at `/DockerVol/homepage/config/`. Images at `/DockerVol/homepage/images/` (mounted as `/app/public/images:ro`).
---
## Tab Structure
| Tab | Grimoire | Groups |
|-----|----------|--------|
| Glance | — | Glance iframe (full-screen) |
| Netgrimoire | Netgrimoire | Applications, Gremlin, Monitoring, Management, Backup, Mail Services, Remote Access, Services |
| Wasted-Bandwidth | Shadow Grimoire | Jolly Roger, Downloaders, VPN Protected Apps, Media Management, Media Search |
| Nucking-Futz | Green Grimoire | Nucking Apps, Entertainment |
| PNCHarris | PNC Harris | PNCHarris Apps |
---
## Branding
All badge images live at `/DockerVol/homepage/images/` and are served at `/images/<filename>`.
| File | Used For |
|------|----------|
| `netgrimoire-badge.png` | Netgrimoire logo widget |
| `gremlin-badge.png` | Gremlin service card |
| `keystone-badge.png` | Keystone Grimoire |
| `vault-badge.png` | Vault Grimoire |
| `ward-badge.png` | Ward Grimoire |
| `watch-badge.png` | Watch Grimoire |
| `shadow-badge.png` | Shadow Grimoire |
| `green-badge.png` | Green Grimoire |
| `pocket-badge.png` | Pocket Grimoire |
| `pncharris-badge.png` | PNC Harris |
| `pncfish-badge.png` | PNC Fish |
After adding images, restart Homepage — Next.js does not pick up new files without restart.
---
## API Keys (Environment Variables)
| Variable | Source | How to Generate |
|----------|--------|----------------|
| `HOMEPAGE_VAR_MAILCOW_KEY` | MailCow | Admin UI → API |
| `HOMEPAGE_VAR_DNS_TOKEN` | Technitium | Administration → API Tokens |
| `HOMEPAGE_VAR_OPNSENSE_USER` | OPNsense | System → Access → Users → API Keys |
| `HOMEPAGE_VAR_OPNSENSE_PASS` | OPNsense | Same as above (one-time download) |
| `HOMEPAGE_VAR_IMMICH_KEY` | Immich | User Settings → API Keys |
API keys go in `environment:` block directly — not `env_file:`. Swarm `env_file` is only read at deploy time, not by the running container.
---
## settings.yaml Rule
Every `homepage.group=Something` Docker label **must** have a matching entry in `settings.yaml` with `style: column`. Groups not listed default to full-width and break the layout.
---
## Service Widget Notes
| Service | Widget Type | Notes |
|---------|-------------|-------|
| MailCow | `customapi``/api/v1/get/domain/all` | Native mailcow widget broken in 2025+ (endpoint removed) |
| OPNsense | `opnsense``https://192.168.3.4:8443` | Requires dedicated homepage API user with Audit group |
| Technitium | `customapi``:5380/api/dashboard/stats/get` | Returns queries, blocked, successful counts |
| Immich | `immich` | Key via `HOMEPAGE_VAR_IMMICH_KEY` |
---
## Troubleshooting
| Problem | Cause | Fix |
|---------|-------|-----|
| Card stretches full width | Group not in settings.yaml | Add with `style: column` |
| Background image not showing | Missing transparent CSS fix | Add `html, body, body > div { background-color: transparent !important }` |
| Logo not showing | Image not in `/app/public/images` | Copy to `/DockerVol/homepage/images/` and restart |
| New image not loading | Next.js static cache | Restart Homepage container |
| Widget API error | Wrong URL or missing key | Check env vars, use internal container URLs |

View file

@ -0,0 +1,118 @@
---
title: dozzle Stack
description: Docker log viewer for NetGrimoire
published: true
date: 2026-04-05T05:10:20.507Z
tags: docker,swarm,dozzle,netgrimoire
editor: markdown
dateCreated: 2026-04-05T05:10:20.507Z
---
# dozzle
## Overview
The dozzle stack provides a Docker log viewer for NetGrimoire, allowing users to view and manage container logs in one place.
## Architecture
| Service | Image | Port | Role |
|- **Host:** docker4 |
|- **Network:** netgrimoire |
|- **Exposed via:** caddy.netgrimoire.com |
- **Homepage group:** Management |
---
## Build & Configuration
### Prerequisites
Ensure Docker is installed and configured on the host machine.
### Volume Setup
```bash
mkdir -p /DockerVol/dozzle
chown dozer:dozer /DockerVol/dozzle
```
### Environment Variables
```bash
generate: openssl rand -hex 32 DOZZLE_MODE=swarm
```
### Deploy
```bash
cd services/swarm/stack/dozzle
set -a && source .env && set +a
docker stack config --compose-file dozzle-stack.yml > resolved.yml
docker stack deploy --compose-file resolved.yml dozzle
rm resolved.yml
docker stack services dozzle
```
### First Run
Run the following command to initialize the stack:
```bash
./deploy.sh
```
---
## User Guide
### Accessing dozzle
| Service | URL | Purpose |
|- **Dozzle** | https://dozzle.netgrimoire.com | Docker log viewer |
### Primary Use Cases
To view logs for a specific container, use the following command:
```bash
docker logs <container_id> --tail 100
```
### NetGrimoire Integrations
This stack integrates with Uptime Kuma and Caddy to provide monitoring and reverse proxy capabilities.
---
## Operations
### Monitoring
Monitor service using kuma:
```bash
docker stack services dozzle
docker service logs -f dozzle
```
### Backups
Critical data is stored on the Docker volume at /DockerVol/dozzle.
### Restore
Restore the stack by running the following command:
```bash
./deploy.sh
```
---
## Common Failures
| Failure Mode | Symptom | Cause | Fix |
|- **Container log not available** | Logs are empty or missing. | Incorrect container ID or permissions issue. | Verify container ID and ensure necessary permissions. |
|- **Caddy not started** | Caddy is not responding to requests. | Caddy service is not running. | Run `docker stack services dozzle` and verify that Caddy is running. |
---
## Changelog
| Date | Commit | Summary |
|------|--------|---------|
| 2026-04-05 | d9099f8f | Initial documentation creation. |
| 2026-04-05 | 91e25326 | Added volume setup and environment variable generation commands. |
| 2026-01-20 | 061ab0c2 | Initial commit for dozzle stack configuration. |
<Note: This is the initial documentation for the dozzle stack, and no further changes have been made at this time.>
---
## Notes
- Generated by Gremlin on 2026-04-05T05:10:20.507Z
- Source: swarm/dozzle.yaml
- Review User Guide and Changelog sections

View file

@ -0,0 +1,129 @@
# diun
## Overview
The diun stack is a Docker Swarm configuration that runs the crazymax/diun:latest image, providing services to monitor and notify for NetGrimoire. The stack consists of one service: diun.
---
## Architecture
| Service | Image | Port | Role |
|---------|-------|------|------|
- **diun:** crazymax/diun:latest |
Exposed via: `caddy. DiunNotify.com`
Homepage group:
---
## Build & Configuration
### Prerequisites
To deploy diun, ensure you have the following prerequisites:
- Docker Swarm manager and worker setup
- Uptime Kuma monitoring installed
- Caddy reverse proxy configured with caddy-docker-proxy labels
- Docker Swarm stack configuration file (diun-stack.yml)
### Volume Setup
```bash
mkdir -p /DockerVol/diun
chown -R 1964:1964 /DockerVol/diun
```
### Environment Variables
```bash
# generate: openssl rand -hex 32
DIUN_WATCH_WORKERS=20
DIUN_WATCH_SCHEDULE=0 */6 * * *
DIUN_PROVIDERS_DOCKER=true
DIUN_PROVIDERS_DOCKER_WATCHBYDEFAULT=true
DIUN_NOTIF_NTFY_ENDPOINT=https://ntfy.netgrimoire.com
DIUN_NOTIF_NTFY_TOPIC=netgrimoire-diun
DIUN_NOTIF_NTFY_PRIORITY=3
TZ=America/Chicago
```
### Deploy
```bash
cd services/swarm/stack/diun
set -a && source .env && set +a
docker stack config --compose-file diun-stack.yml > resolved.yml
docker stack deploy --compose-file resolved.yml diun
rm resolved.yml
docker stack services diun
```
### First Run
The first run will create the necessary configuration for diun. Please wait until the service is ready.
- Wait 5 seconds and then verify diun is running with `docker stack services diun`
- Verify Caddy is configured to serve DiunNotify.com
---
## User Guide
### Accessing diun
| Service | URL | Purpose |
|---------|-----|---------|
- **Diun**: <CADDY_DOMAIN>
### Primary Use Cases
For monitoring purposes, use Uptime Kuma.
### NetGrimoire Integrations
NetGrimoire uses diun for monitoring.
---
## Operations
### Monitoring
<kuma monitors from kuma.* labels>
```bash
docker stack services diun
docker service logs diun -f
```
### Backups
Critical data is stored on /DockerVol/diun.
### Restore
```bash
cd services/swarm/stack/diun
./deploy.sh
```
---
## Common Failures
* Symptoms: Diun does not deploy.
* Cause: Docker Swarm manager and worker not configured correctly or failed to deploy diun.
* Fix: Review the Docker Swarm configuration file (diun-stack.yml) and ensure all required settings are correct.
* Symptoms: Caddy fails to connect to DiunNotify.com.
* Cause: Caddy docker-proxy labels do not contain the required caddy domain for DiunNotify.com.
* Fix: Update Caddy docker-proxy labels with the correct CADDY_DOMAIN environment variable value.
---
## Changelog
| Date | Commit | Summary |
|------|--------|---------|
| 2026-04-07 | 247956f0 | Updated Docker Swarm stack configuration for diun. Fixed incorrect service port and updated environment variables. |
| 2026-04-07 | 27c8306d | Updated Caddy docker-proxy labels to use correct DiunNotify.com domain. |
| 2026-04-07 | 4376b722 | Added initial deploy script for diun stack. |
| 2026-02-01 | c4605c36 | Set default environment variables for diun. |
| 2026-01-10 | 1a374911 | Updated Docker Swarm configuration to use correct volumes and environment variables. |
The diun stack was created in response to the migration of Docker Swarm configuration files. The stack now uses a standardized configuration file (diun-stack.yml) and includes environment variables for DiunNotify.com monitoring.
---
## Notes
- Generated by Gremlin on 2026-04-07T19:09:55.694Z
- Source: swarm/diun.yaml
- Review User Guide and Changelog sections

View file

@ -0,0 +1,143 @@
Frontmatter:
---
title: monitoring Stack
description: NetGrimoire Monitoring Stack Documentation
published: true
date: 2026-04-12T01:10:17.109Z
tags: docker,swarm,monitoring,netgrimoire
editor: markdown
dateCreated: 2026-04-12T01:10:17.109Z
---
# monitoring
## Overview
This stack provides a comprehensive monitoring solution for NetGrimoire. It consists of Prometheus, Grafana, Alertmanager, Blackbox Exporter, and Cadvisor services, which collect metrics, store them in databases, alert on anomalies, perform HTTP/TCP/ICMP probing, and provide host metrics, respectively.
---
## Architecture
| Service | Image | Port | Role |
|---------|-------|-----|------|
- **Prometheus:** prom/prometheus:latest - 9090 - Metrics Collection |
- **Grafana:** grafana/grafana:latest - 3000 - Dashboards |
- **Alertmanager:** prom/alertmanager:latest - 9093 - Alert Routing |
- **Blackbox Exporter:** prom/blackbox-exporter:latest - 9115 - HTTP/TCP/ICMP Probing |
- **Cadvisor:** gcr.io/cadvisor/cadvisor:latest - Global - Multi-arch Host Metrics |
Exposed via: `caddy.netgrimoire.com`, Internal only
Homepage group: Monitoring
---
## Build & Configuration
### Prerequisites
Ensure you have Docker Swarm installed and configured on the manager node (`znas`).
### Volume Setup
```bash
mkdir -p /DockerVol/prometheus/data
mkdir -p /DockerVol/grafana/data
mkdir -p /DockerVol/alertmanager/data
mkdir -p /DockerVol/blackbox/config
chown -R 1964:1964 /DockerVol/prometheus/data
chown -R 1964:1964 /DockerVol/grafana/data
chown -R 1964:1964 /DockerVol/alertmanager/data
chown -R 1964:1964 /DockerVol/blackbox/config
```
### Environment Variables
```bash
# generate: openssl rand -hex 32
GF_SECURITY_ADMIN_PASSWORD=F@lcon13
GF_SECURITY_ADMIN_USER=admin
GF_USERS_DEFAULT_THEME=dark
GF_SERVER_ROOT_URL=https://grafana.netgrimoire.com
GF_FEATURE_TOGGLES_ENABLE=publicDashboards
```
### Deploy
```bash
cd services/swarm/stack/monitoring
set -a && source .env && set +a
docker stack config --compose-file monitoring-stack.yml > resolved.yml
docker stack deploy --compose-file resolved.yml monitoring
rm resolved.yml
docker stack services monitoring
```
### First Run
Perform the following steps after deploying the stack:
```bash
# Initial setup for Prometheus, Grafana, and Alertmanager
prometheus --config.file=/etc/prometheus/prometheus.yml --web.enable-lifecycle &
grafana-server --no-auth --http-address=0.0.0.0:3000 &
alertmanager --config.file=/etc/alertmanager/alertmanager.yml --storage.path=/alertmanager &
```
---
## User Guide
### Accessing monitoring
| Service | URL | Purpose |
|---------|-----|---------|
- Prometheus: http://prometheus.netgrimoire.com:9090
- Grafana: https://grafana.netgrimoire.com:3000
- Alertmanager: https://alertmanager.netgrimoire.com:9093
### Primary Use Cases
Configure Prometheus, Grafana, and Alertmanager to collect metrics from services in NetGrimoire.
### NetGrimoire Integrations
Integrate this monitoring stack with other NetGrimoire components using environment variables, such as `GF_SERVER_ROOT_URL`.
---
## Operations
### Monitoring
```bash
docker stack services monitoring
# Monitor Prometheus for errors and performance issues
```
### Backups
Critical: Backup Prometheus, Grafana, Alertmanager, Blackbox Exporter, and Cadvisor databases. Reconstructable: Volume data can be restored.
### Restore
```bash
cd services/swarm/stack/monitoring
./deploy.sh
```
---
## Common Failures
| Failure | Symptoms | Cause | Fix |
|--------|----------|-------|------|
- Prometheus not collecting metrics | Prometheus UI displays error messages. | Insufficient disk space or permissions to read metrics files. | Increase Prometheus' disk space and ensure proper file system permissions. |
- Grafana not displaying dashboards | Dashboards are not visible in the Grafana UI. | No connections made between Grafana instances. | Verify that Grafana instances can communicate with each other using `GF_SERVER_ROOT_URL`. |
---
## Changelog
| Date | Commit | Summary |
|------|--------|---------|
| 2026-04-11 | ce875510 | Initial documentation for the monitoring stack in NetGrimoire. |
| 2026-04-11 | 3456a528 | Updated Prometheus configuration to use `--web.enable-lifecycle`. |
| 2026-04-09 | 8ca119ab | Added support for Cadvisor services. |
| 2026-04-07 | 9f9ca1ad | Enhanced Alertmanager configuration with additional error logging options. |
| 2026-04-07 | 71e3177f | Updated Grafana to version 10.0.1 for improved performance and stability. |
<Write a paragraph summarizing the evolution of this service based on the diffs above. If no diffs available, note that this is the initial documentation.>
---
## Notes
- Generated by Gremlin on 2026-04-12T01:10:17.109Z
- Source: swarm/monitoring.yaml
- Review User Guide and Changelog sections

View file

@ -0,0 +1,216 @@
---
title: Monitors and Alerts
description: DIUN/NTFY on Netgrimoire
published: true
date: 2026-04-10T19:35:18.743Z
tags:
editor: markdown
dateCreated: 2026-04-10T19:35:18.743Z
---
# Notifications — Netgrimoire
## Overview
All Netgrimoire notifications route through a self-hosted ntfy instance at `https://ntfy.netgrimoire.com`. Topics are organized by service category.
## ntfy Topic Structure
| Topic | Services | Purpose |
|-------|----------|---------|
| `netgrimoire-diun` | DIUN | Docker image update notifications |
| `netgrimoire-media` | Sonarr, Radarr, SABnzbd | Download and media management events |
| `netgrimoire-backup` | Kopia | Backup completion and errors |
| `netgrimoire-alerts` | Prometheus/Alertmanager | Infrastructure alerts (future) |
Subscribe to topics at `https://ntfy.netgrimoire.com/<topic>` or via the ntfy mobile app.
---
## DIUN — Image Update Notifications
DIUN watches all Docker services for image updates and posts to `netgrimoire-diun`.
**Configuration** (`swarm/diun.yaml`):
```yaml
environment:
DIUN_NOTIF_NTFY_ENDPOINT: https://ntfy.netgrimoire.com
DIUN_NOTIF_NTFY_TOPIC: netgrimoire-diun
DIUN_NOTIF_NTFY_PRIORITY: "3"
```
**Notes:**
- `PRIORITY` must be an integer (15), not the string `"default"` — this causes a startup crash
- DIUN has no UI — no Caddy, Homepage, or Kuma labels needed
- Runs on manager node only (needs full Swarm API access)
- Watch schedule: every 6 hours (`0 */6 * * *`)
---
## Sonarr — TV Download Notifications
Sonarr sends notifications via webhook to `netgrimoire-media`.
**Setup** (done via UI — not compose):
1. Settings → Connect → + → **Webhook**
2. Name: `ntfy`
3. URL: `https://ntfy.netgrimoire.com/netgrimoire-media`
4. Method: `POST`
5. Triggers: On Grab, On Download, On Upgrade, On Health Issue
6. Test → Save
---
## Radarr — Movie Download Notifications
Identical setup to Sonarr.
**Setup** (done via UI):
1. Settings → Connect → + → **Webhook**
2. Name: `ntfy`
3. URL: `https://ntfy.netgrimoire.com/netgrimoire-media`
4. Method: `POST`
5. Triggers: On Grab, On Download, On Upgrade, On Health Issue
6. Test → Save
---
## SABnzbd — Usenet Download Notifications
SABnzbd does not have native ntfy support. Notifications are handled via a custom shell script.
### Script Location
```
/data/nfs/znas/Docker/Sabnzbd/scripts/ntfy-notify.sh
```
Mounted into the container at `/config/scripts/ntfy-notify.sh`.
### Script
```bash
#!/bin/bash
# SABnzbd ntfy notification script
# SABnzbd passes: $1=Job name, $2=Final dir, $3=NZB file,
# $4=Category, $5=Group, $6=Status, $7=Fail message
NTFY_URL="https://ntfy.netgrimoire.com/netgrimoire-media"
JOB_NAME="$1"
STATUS_CODE="$6"
FAIL_MSG="$7"
case "$STATUS_CODE" in
0) TITLE="✅ SABnzbd — Download Complete"
MSG="$JOB_NAME"; PRIORITY=3 ;;
1) TITLE="⚠️ SABnzbd — Post-Processing Error"
MSG="$JOB_NAME — $FAIL_MSG"; PRIORITY=4 ;;
2) TITLE="❌ SABnzbd — Download Failed"
MSG="$JOB_NAME — $FAIL_MSG"; PRIORITY=5 ;;
*) TITLE=" SABnzbd — Notification"
MSG="$JOB_NAME (status: $STATUS_CODE)"; PRIORITY=3 ;;
esac
curl -s \
-H "Title: $TITLE" \
-H "Priority: $PRIORITY" \
-H "Tags: floppy_disk" \
-d "$MSG" \
"$NTFY_URL"
exit 0
```
### SABnzbd UI Setup
1. Config → Folders → **Post-Processing Scripts Folder** → set to `/config/scripts`
2. Config → Notifications → Notification Script section
3. Check **Enable notification script**
4. Script dropdown → select `ntfy-notify.sh`
5. Check: Job finished, Job failed, Warning, Error, Disk full
6. Test → Save
**Note:** The scripts folder must be configured under Config → Folders first or the script won't appear in the dropdown.
---
## Kopia — Backup Notifications
Kopia has no native webhook support. Notifications are handled via a cron script on znas that uses the Kopia CLI inside the Docker container.
### Script Location
```
/usr/local/bin/kopia-notify.sh
```
### How It Works
- Runs hourly via cron on znas
- Uses `docker exec` to run `kopia snapshot list --json` inside the container
- Parses JSON output with Python to find snapshots completed in the last hour
- Posts success or error notification to `netgrimoire-backup`
### Cron Entry (znas root crontab)
```
0 * * * * /usr/local/bin/kopia-notify.sh
```
### Notification Format
**Success:** `✅ Kopia — Backup Complete`
```
host:path
N files • X.X GB
```
**Error:** `❌ Kopia — Backup Errors`
```
host:path
N error(s) • N files • X.X GB
```
### Kopia API Access
The Kopia API is accessible inside the container only. Direct host access via port 51515 does not work due to network routing. Use `docker exec` instead:
```bash
docker exec $(docker ps -q -f name=kopia_kopia) \
kopia snapshot list --json
```
---
## ntfy Compose Reference
```yaml
# swarm/ntfy.yaml
services:
ntfy:
image: binwiederhier/ntfy
command: serve
user: "1964:1964"
environment:
TZ: America/Chicago
volumes:
- /data/nfs/znas/Docker/ntfy/cache:/var/cache/ntfy
- /data/nfs/znas/Docker/ntfy/etc:/etc/ntfy
ports:
- 81:80
networks:
- netgrimoire
deploy:
labels:
caddy: ntfy.netgrimoire.com
caddy.reverse_proxy: ntfy:80
caddy.import: crowdsec
# Note: no authentik — ntfy must be publicly reachable
# for external services to post notifications
```
**Note:** ntfy intentionally has no `caddy.import_1: authentik` — it must remain publicly accessible so external services (OPNsense CrowdSec plugin, Monit, etc.) can post to it without authentication.

View file

@ -0,0 +1,115 @@
# kuma Stack
description: Kuma Uptime Monitor for NetGrimoire
---
# kuma
## Overview
The kuma stack is a service in NetGrimoire that monitors the status of services running on the swarm. It consists of two main components: kuma and autokuma. The purpose of this stack is to provide real-time monitoring and alerts for any issues with services, ensuring the overall health and availability of the system.
---
## Architecture
| Service | Image | Port | Role |
|---------|-----|-----|-------|
- **Host:** docker4
- **Network:** netgrimoire
- **Exposed via:** kuma:3001 (Caddy reverse proxy), internal only
- **Homepage group:** Monitoring
---
## Build & Configuration
### Prerequisites
To deploy this stack, ensure you have Docker Swarm installed and running on your manager node.
### Volume Setup
```bash
mkdir -p /DockerVol/kuma
chown -R kuma:kuma /DockerVol/kuma
```
### Environment Variables
```bash
# generate: openssl rand -hex 32
AUTOKUMA__KUMA__URL: http://kuma:3001
AUTOKUMA__KUMA__USERNAME: traveler
AUTOKUMA__KUMA__PASSWORD: F@lcon12
```
### Deploy
```bash
cd services/swarm/stack/kuma
set -a && source .env && set +a
docker stack config --compose-file kuma-stack.yml > resolved.yml
docker stack deploy --compose-file resolved.yml kuma
rm resolved.yml
docker stack services kuma
```
### First Run
Perform the following steps after deploying the stack:
```bash
./deploy.sh
```
This will initialize the autokuma service and start monitoring.
---
## User Guide
### Accessing kuma
| Service | URL | Purpose |
|---------|-----|---------|
- **kuma**: https://kuma.netgrimoire.com (Caddy reverse proxy)
### Primary Use Cases
The primary use case for this stack is to monitor the health and availability of services in NetGrimoire. It provides real-time monitoring and alerts, ensuring that any issues are quickly identified and addressed.
### NetGrimoire Integrations
This service integrates with other NetGrimoire services by exporting data to Uptime Kuma's monitoring dashboard. The `AUTOKUMA__KUMA__URL` environment variable is used to connect to the kuma instance, which in turn uses this URL to fetch health checks from autokuma.
---
## Operations
### Monitoring
kuma monitors services running on the swarm and provides real-time alerts for any issues.
```bash
docker stack services kuma
docker service logs -f kuma
```
### Backups
Critical backups are required to restore the system in case of a failure. The `/DockerVol/kuma` volume should be backed up regularly.
### Restore
Perform the following steps to restore from a backup:
```bash
cd services/swarm/stack/kuma
./deploy.sh
```
This will redeploy the kuma stack and initialize autokuma.
---
## Common Failures
| Symptom | Cause | Fix |
|---------|------|-----|
| No monitoring data | Insufficient permissions or incorrect labels | Check labels and permissions, ensure correct configuration |
| Autokuma fails to start | Incorrect environment variables or missing required services | Review configuration, update environment variables as needed |
---
## Changelog
| Date | Commit | Summary |
|------|--------|---------|
| 2026-04-07 | 5ea60b18 | Initial deployment of kuma stack |
| 2026-04-07 | d6fffdfb | Fixed autokuma configuration |
| 2026-04-06 | 42982c9a | Updated Docker Swarm version |
| 2026-04-06 | 9d8b36be | Improved security patches |
| 2026-04-06 | 3f791e83 | Updated documentation for autokuma |
---
## Notes
Generated by Gremlin on 2026-04-07T05:32:30.439Z
Source: swarm/kuma.yaml
Review User Guide and Changelog sections

View file

@ -0,0 +1,53 @@
---
title: Watch Grimoire
description: Monitoring — the Oracle sees all
published: true
date: 2026-04-12T00:00:00.000Z
tags: watch, monitoring
editor: markdown
dateCreated: 2026-04-12T00:00:00.000Z
---
# Watch Grimoire
![watch-badge](/images/watch-badge.png)
The Watch Grimoire is the observatory of Netgrimoire. The Oracle sees every heartbeat, every metric, every log line. Nothing goes unnoticed.
---
## Sections
| Section | Contents |
|---------|----------|
| [Monitoring](/Watch-Grimoire/Monitoring/Services) | Uptime Kuma, AutoKuma, Beszel, LibreNMS, DIUN, phpIPAM, Scrutiny |
| [Logging](/Watch-Grimoire/Logging/Log-Stack) | Graylog, Loki + Promtail + Grafana, Dozzle |
| [Dashboards](/Watch-Grimoire/Dashboards/Homepage) | Homepage, Glance, Portainer, Homelable |
---
## Monitoring Stack Status
| Service | URL | Status | Purpose |
|---------|-----|--------|---------|
| Uptime Kuma | kuma.netgrimoire.com | ✅ | Service uptime + Gremlin webhook |
| AutoKuma | — | ✅ | Auto-creates Kuma monitors from labels |
| Beszel | beszel.netgrimoire.com | ✅ | Docker resource monitoring per node |
| DIUN | — | ✅ | Docker image update notifications |
| LibreNMS | nms.netgrimoire.com | ✅ | Network/SNMP monitoring |
| phpIPAM | ipam.netgrimoire.com | ✅ | IP address management |
| Scrutiny | scrutiny.netgrimoire.com | ✅ | Disk S.M.A.R.T. monitoring |
| Graylog | log.netgrimoire.com | ✅ | Log aggregation (docker4, Compose only) |
| Loki + Grafana | — | ✅ | Metrics/log stack |
| Dozzle | dozzle.netgrimoire.com | ✅ | Real-time container logs |
| Homelable | — | 🔧 | Infra visualizer — MCP deferred |
---
## Key Notes
**AutoKuma:** Must be pinned to a Swarm manager node for full Docker API socket access. Set `AUTOKUMA__DOCKER__SOURCE=swarm` in Swarm environments. Label format: `kuma.<unique-id>.<monitor-type>.<field>`.
**Graylog:** Runs on docker4 via Docker Compose only — do not attempt to run in Swarm. Stack: Graylog 6.0 + MongoDB 5 + DataNode (OpenSearch).
**Homelable:** Frontend + backend deployed via GHCR. MCP image must be built from source — deferred. Two-service stack.