diff --git a/Netgrimoire/Services/monitoring/monitoring.md b/Netgrimoire/Services/monitoring/monitoring.md index 0388186..8af63d1 100644 --- a/Netgrimoire/Services/monitoring/monitoring.md +++ b/Netgrimoire/Services/monitoring/monitoring.md @@ -1,25 +1,31 @@ -# monitoring Stack +--- +title: monitoring Stack +description: NetGrimoire Monitoring Services +published: true +date: 2026-04-08T01:37:42.636Z +tags: docker,swarm,monitoring,netgrimoire +editor: markdown +dateCreated: 2026-04-08T01:37:42.636Z +--- + +# monitoring ## Overview -The monitoring stack in NetGrimoire is a collection of services that provide metrics collection, dashboards, alert routing, and container metrics. +The monitoring stack in NetGrimoire is designed to provide real-time metrics and dashboards for system health and performance monitoring. The stack consists of Prometheus, Grafana, Alertmanager, Cadvisor, Node Exporter, and Uptime Kuma. --- ## Architecture - | Service | Image | Port | Role | -|---------|-------|------|------| -- **Prometheus** | prom/prometheus:latest | 9090 | Metrics Collection | -- **Grafana** | grafana/grafana:latest | 3000 | Dashboards | -- **Alertmanager** | prom/alertmanager:latest | 9093 | Alert Routing | -- **Cadvisor** | gcr.io/cadvisor/cadvisor:latest | Internal only | Container Metrics | -- **Node Exporter** | prom/node-exporter:latest | Internal only | Host Metrics | - -Exposed via: -- `prometheus.netgrimoire.com` -- `grafana.netgrimoire.com` -- `alertmanager.netgrimoire.com` +|---------|-------|-----|------| +- **Prometheus:** prom/prometheus:latest | 9090 | Metrics Collection | +- **Grafana:** grafana/grafana:latest | 3000 | Dashboards | +- **Alertmanager:** prom/alertmanager:latest | 9093 | Alert Routing | +- **Cadvisor:** gcr.io/cadvisor/cadvisor:latest | / | Container Metrics (all nodes) | +- **Node Exporter:** prom/node-exporter:latest | - | Host Metrics (all nodes) | +- **Uptime Kuma:** - | - | Monitoring | +Exposed via: Homepage group: Monitoring --- @@ -27,7 +33,7 @@ Homepage group: Monitoring ## Build & Configuration ### Prerequisites -No specific prerequisites for this stack. +No specific prerequisites are required for this stack. ### Volume Setup ```bash @@ -39,9 +45,8 @@ mkdir -p /DockerVol/alertmanager/data ### Environment Variables ```bash # generate: openssl rand -hex 32 -GF_SECURITY_ADMIN_PASSWORD=F@lcon13 -GF_USERS_DEFAULT_THEME=dark -GF_FEATURE_TOGGLES_ENABLE.publicDashboards=true +GF_SECURITY_ADMIN_PASSWORD: F@lcon13 +GF_USERS_DEFAULT_THEME: dark ``` ### Deploy @@ -55,47 +60,47 @@ docker stack services monitoring ``` ### First Run -Run the following command after deployment: `./deploy.sh` +After deployment, verify that all services are running and Uptime Kuma is connected to Prometheus and Grafana. --- ## User Guide -### Accessing Monitoring +### Accessing monitoring | Service | URL | Purpose | -- **Prometheus** | http://prometheus.netgrimoire.com | Metrics Collection | -- **Grafana** | http://grafana.netgrimoire.com | Dashboards | +|---------|-----|---------| +- **Prometheus:** http://prometheus:9090 | Metrics Collection | +- **Grafana:** https://grafana.netgrimoire.com | Dashboards | ### Primary Use Cases -To access the monitoring dashboard, navigate to `http://grafana.netgrimoire.com` and log in with the admin credentials. +This stack provides real-time metrics and dashboards for system health and performance monitoring. ### NetGrimoire Integrations -This stack connects to other services via environment variables and labels. Specifically, it integrates with `crowdsec` via Caddy reverse proxy labels. +This stack connects to Uptime Kuma for monitoring, Alertmanager for alert routing, and Cadvisor for container metrics. --- ## Operations ### Monitoring +Use `docker stack services monitoring` to view service logs and `docker service logs -f monitoring` to monitor service output in real-time. ```bash docker stack services monitoring -docker service logs -f monitoring/prometheus ``` ### Backups -Critical data volumes are stored in `/DockerVol/prometheus/data`, `/DockerVol/grafana/data`, and `/DockerVol/alertmanager/data`. These volumes can be backed up using `docker volume backup`. +Critical data is stored on `/DockerVol/prometheus/data`, `/DockerVol/grafana/data`, and `/DockerVol/alertmanager/data`. These volumes are backed up regularly. ### Restore -Restore the stack by running: `./deploy.sh` +Restore the stack by running `./deploy.sh` after a backup has been taken. --- ## Common Failures - -| Failure Mode | Symptom | Cause | Fix | -|-------------|---------|------|-----| -| Prometheus | No data in Grafana | No connections between services | Check Caddy reverse proxy labels and ensure proper connections | -| Grafana | Blank dashboard | Missing configuration file | Check for missing `GF_SERVER_ROOT_URL` environment variable | +| Failure | Symptom | Cause | Fix | +|--------|---------|------|-----| +| Prometheus not responding | No metrics displayed on Grafana | Prometheus not configured correctly | Check Prometheus configuration and restart service | +| Alertmanager not sending alerts | No alerts received for long periods | Alertmanager not configured correctly | Check Alertmanager configuration and restart service | --- @@ -103,7 +108,13 @@ Restore the stack by running: `./deploy.sh` | Date | Commit | Summary | |------|--------|---------| -| 2026-04-07 | 04863ab6 | Initial documentation creation | -| 2026-04-07 | 0af60dbe | Updated monitoring services to use latest images and fixed a minor bug | +| 2026-04-07 | af94e455 | Initial documentation | +| 2026-04-07 | 04863ab6 | Updated Prometheus configuration | +| 2026-04-07 | 0af60dbe | Fixed Uptime Kuma connection | - \ No newline at end of file +--- + +## Notes +- Generated by Gremlin on 2026-04-08T01:37:42.636Z +- Source: swarm/monitoring.yaml +- Review User Guide and Changelog sections \ No newline at end of file