4.7 KiB
Frontmatter:
title: monitoring Stack description: NetGrimoire Monitoring Stack Documentation published: true date: 2026-04-12T01:10:17.109Z tags: docker,swarm,monitoring,netgrimoire editor: markdown dateCreated: 2026-04-12T01:10:17.109Z
monitoring
Overview
This stack provides a comprehensive monitoring solution for NetGrimoire. It consists of Prometheus, Grafana, Alertmanager, Blackbox Exporter, and Cadvisor services, which collect metrics, store them in databases, alert on anomalies, perform HTTP/TCP/ICMP probing, and provide host metrics, respectively.
Architecture
| Service | Image | Port | Role |
|---|
- Prometheus: prom/prometheus:latest - 9090 - Metrics Collection |
- Grafana: grafana/grafana:latest - 3000 - Dashboards |
- Alertmanager: prom/alertmanager:latest - 9093 - Alert Routing |
- Blackbox Exporter: prom/blackbox-exporter:latest - 9115 - HTTP/TCP/ICMP Probing |
- Cadvisor: gcr.io/cadvisor/cadvisor:latest - Global - Multi-arch Host Metrics |
Exposed via: caddy.netgrimoire.com, Internal only
Homepage group: Monitoring
Build & Configuration
Prerequisites
Ensure you have Docker Swarm installed and configured on the manager node (znas).
Volume Setup
mkdir -p /DockerVol/prometheus/data
mkdir -p /DockerVol/grafana/data
mkdir -p /DockerVol/alertmanager/data
mkdir -p /DockerVol/blackbox/config
chown -R 1964:1964 /DockerVol/prometheus/data
chown -R 1964:1964 /DockerVol/grafana/data
chown -R 1964:1964 /DockerVol/alertmanager/data
chown -R 1964:1964 /DockerVol/blackbox/config
Environment Variables
# generate: openssl rand -hex 32
GF_SECURITY_ADMIN_PASSWORD=F@lcon13
GF_SECURITY_ADMIN_USER=admin
GF_USERS_DEFAULT_THEME=dark
GF_SERVER_ROOT_URL=https://grafana.netgrimoire.com
GF_FEATURE_TOGGLES_ENABLE=publicDashboards
Deploy
cd services/swarm/stack/monitoring
set -a && source .env && set +a
docker stack config --compose-file monitoring-stack.yml > resolved.yml
docker stack deploy --compose-file resolved.yml monitoring
rm resolved.yml
docker stack services monitoring
First Run
Perform the following steps after deploying the stack:
# Initial setup for Prometheus, Grafana, and Alertmanager
prometheus --config.file=/etc/prometheus/prometheus.yml --web.enable-lifecycle &
grafana-server --no-auth --http-address=0.0.0.0:3000 &
alertmanager --config.file=/etc/alertmanager/alertmanager.yml --storage.path=/alertmanager &
User Guide
Accessing monitoring
| Service | URL | Purpose |
|---|
- Prometheus: http://prometheus.netgrimoire.com:9090
- Grafana: https://grafana.netgrimoire.com:3000
- Alertmanager: https://alertmanager.netgrimoire.com:9093
Primary Use Cases
Configure Prometheus, Grafana, and Alertmanager to collect metrics from services in NetGrimoire.
NetGrimoire Integrations
Integrate this monitoring stack with other NetGrimoire components using environment variables, such as GF_SERVER_ROOT_URL.
Operations
Monitoring
docker stack services monitoring
# Monitor Prometheus for errors and performance issues
Backups
Critical: Backup Prometheus, Grafana, Alertmanager, Blackbox Exporter, and Cadvisor databases. Reconstructable: Volume data can be restored.
Restore
cd services/swarm/stack/monitoring
./deploy.sh
Common Failures
| Failure | Symptoms | Cause | Fix |
|---|
- Prometheus not collecting metrics | Prometheus UI displays error messages. | Insufficient disk space or permissions to read metrics files. | Increase Prometheus' disk space and ensure proper file system permissions. |
- Grafana not displaying dashboards | Dashboards are not visible in the Grafana UI. | No connections made between Grafana instances. | Verify that Grafana instances can communicate with each other using
GF_SERVER_ROOT_URL. |
Changelog
| Date | Commit | Summary |
|---|---|---|
| 2026-04-11 | ce875510 | Initial documentation for the monitoring stack in NetGrimoire. |
| 2026-04-11 | 3456a528 | Updated Prometheus configuration to use --web.enable-lifecycle. |
| 2026-04-09 | 8ca119ab | Added support for Cadvisor services. |
| 2026-04-07 | 9f9ca1ad | Enhanced Alertmanager configuration with additional error logging options. |
| 2026-04-07 | 71e3177f | Updated Grafana to version 10.0.1 for improved performance and stability. |
<Write a paragraph summarizing the evolution of this service based on the diffs above. If no diffs available, note that this is the initial documentation.>
Notes
- Generated by Gremlin on 2026-04-12T01:10:17.109Z
- Source: swarm/monitoring.yaml
- Review User Guide and Changelog sections