From b157b3d0649b4d5138ea32e7d93844da5956c34a Mon Sep 17 00:00:00 2001 From: traveler Date: Tue, 7 Apr 2026 20:38:00 -0500 Subject: [PATCH] docs(gremlin): update monitoring --- Netgrimoire/Services/monitoring/monitoring.md | 107 ++++++++++-------- 1 file changed, 62 insertions(+), 45 deletions(-) diff --git a/Netgrimoire/Services/monitoring/monitoring.md b/Netgrimoire/Services/monitoring/monitoring.md index a8e9a10..0388186 100644 --- a/Netgrimoire/Services/monitoring/monitoring.md +++ b/Netgrimoire/Services/monitoring/monitoring.md @@ -1,58 +1,50 @@ -# monitoring +# monitoring Stack -Overview ---------------- +## Overview +The monitoring stack in NetGrimoire is a collection of services that provide metrics collection, dashboards, alert routing, and container metrics. -The monitoring stack provides a comprehensive set of services for metrics collection, dashboard management, alert routing, container metrics, and host metrics in NetGrimoire. The stack includes Prometheus for metrics collection, Grafana for dashboards, Alertmanager for alert routing, Cadvisor for container metrics, and Node Exporter for host metrics. +--- -Architecture -------------- +## Architecture | Service | Image | Port | Role | -|---------|-------|-----|------| -- **Prometheus:** prom/prometheus:latest - - exposed via: `grafana.netgrimoire.com` - - Homepage group: Monitoring +|---------|-------|------|------| +- **Prometheus** | prom/prometheus:latest | 9090 | Metrics Collection | +- **Grafana** | grafana/grafana:latest | 3000 | Dashboards | +- **Alertmanager** | prom/alertmanager:latest | 9093 | Alert Routing | +- **Cadvisor** | gcr.io/cadvisor/cadvisor:latest | Internal only | Container Metrics | +- **Node Exporter** | prom/node-exporter:latest | Internal only | Host Metrics | -- **Grafana:** grafana/grafana:latest - - exposed via: `grafana.netgrimoire.com` - - Homepage group: Monitoring +Exposed via: +- `prometheus.netgrimoire.com` +- `grafana.netgrimoire.com` +- `alertmanager.netgrimoire.com` -- **Alertmanager:** prom/alertmanager:latest - - exposed via: `alertmanager.netgrimoire.com` - - Homepage group: Monitoring +Homepage group: Monitoring -- **Cadvisor:** gcr.io/cadvisor/cadvisor:latest - - exposed via: `cadvisor.netgrimoire.com` - - Homepage group: Monitoring +--- -- **Node Exporter:** prom/node-exporter:latest - - exposed via: `node-exporter.netgrimoire.com` - - Homepage group: Monitoring - -Build & Configuration ---------------------- +## Build & Configuration ### Prerequisites - -- Docker and Docker Swarm installed on docker4 +No specific prerequisites for this stack. ### Volume Setup - ```bash mkdir -p /DockerVol/prometheus/data mkdir -p /DockerVol/grafana/data +mkdir -p /DockerVol/alertmanager/data ``` ### Environment Variables - ```bash # generate: openssl rand -hex 32 GF_SECURITY_ADMIN_PASSWORD=F@lcon13 +GF_USERS_DEFAULT_THEME=dark +GF_FEATURE_TOGGLES_ENABLE.publicDashboards=true ``` ### Deploy - ```bash cd services/swarm/stack/monitoring set -a && source .env && set +a @@ -63,30 +55,55 @@ docker stack services monitoring ``` ### First Run - -- Post-deploy steps specific to these services include configuring network, caddy, and uptime kuma. +Run the following command after deployment: `./deploy.sh` --- ## User Guide ### Accessing Monitoring - | Service | URL | Purpose | -|---------|-----|---------| -- **Prometheus:** https://prometheus.netgrimoire.com -- **Grafana:** https://grafana.netgrimoire.com -- **Alertmanager:** https://alertmanager.netgrimoire.com -- **Cadvisor:** `cadvisor.netgrimoire.com` (Container metrics) -- **Node Exporter:** `node-exporter.netgrimoire.com` (Host metrics) +- **Prometheus** | http://prometheus.netgrimoire.com | Metrics Collection | +- **Grafana** | http://grafana.netgrimoire.com | Dashboards | ### Primary Use Cases - -- Monitoring system performance and health. -- Configuring alerts for critical issues. -- Visualizing metrics in real-time. +To access the monitoring dashboard, navigate to `http://grafana.netgrimoire.com` and log in with the admin credentials. ### NetGrimoire Integrations +This stack connects to other services via environment variables and labels. Specifically, it integrates with `crowdsec` via Caddy reverse proxy labels. -- Connects to Crowdsec via Caddy reverse proxy. -- Uptime Kuma monitors services and detects errors. \ No newline at end of file +--- + +## Operations + +### Monitoring +```bash +docker stack services monitoring +docker service logs -f monitoring/prometheus +``` + +### Backups +Critical data volumes are stored in `/DockerVol/prometheus/data`, `/DockerVol/grafana/data`, and `/DockerVol/alertmanager/data`. These volumes can be backed up using `docker volume backup`. + +### Restore +Restore the stack by running: `./deploy.sh` + +--- + +## Common Failures + +| Failure Mode | Symptom | Cause | Fix | +|-------------|---------|------|-----| +| Prometheus | No data in Grafana | No connections between services | Check Caddy reverse proxy labels and ensure proper connections | +| Grafana | Blank dashboard | Missing configuration file | Check for missing `GF_SERVER_ROOT_URL` environment variable | + +--- + +## Changelog + +| Date | Commit | Summary | +|------|--------|---------| +| 2026-04-07 | 04863ab6 | Initial documentation creation | +| 2026-04-07 | 0af60dbe | Updated monitoring services to use latest images and fixed a minor bug | + + \ No newline at end of file