Netgrimoire/Watch-Grimoire/Monitoring/Monitoring-Config.md
2026-04-12 09:53:51 -05:00

4.7 KiB

Frontmatter:

title: monitoring Stack description: NetGrimoire Monitoring Stack Documentation published: true date: 2026-04-12T01:10:17.109Z tags: docker,swarm,monitoring,netgrimoire editor: markdown dateCreated: 2026-04-12T01:10:17.109Z

monitoring

Overview

This stack provides a comprehensive monitoring solution for NetGrimoire. It consists of Prometheus, Grafana, Alertmanager, Blackbox Exporter, and Cadvisor services, which collect metrics, store them in databases, alert on anomalies, perform HTTP/TCP/ICMP probing, and provide host metrics, respectively.


Architecture

Service Image Port Role
  • Prometheus: prom/prometheus:latest - 9090 - Metrics Collection |
  • Grafana: grafana/grafana:latest - 3000 - Dashboards |
  • Alertmanager: prom/alertmanager:latest - 9093 - Alert Routing |
  • Blackbox Exporter: prom/blackbox-exporter:latest - 9115 - HTTP/TCP/ICMP Probing |
  • Cadvisor: gcr.io/cadvisor/cadvisor:latest - Global - Multi-arch Host Metrics |

Exposed via: caddy.netgrimoire.com, Internal only

Homepage group: Monitoring


Build & Configuration

Prerequisites

Ensure you have Docker Swarm installed and configured on the manager node (znas).

Volume Setup

mkdir -p /DockerVol/prometheus/data
mkdir -p /DockerVol/grafana/data
mkdir -p /DockerVol/alertmanager/data
mkdir -p /DockerVol/blackbox/config
chown -R 1964:1964 /DockerVol/prometheus/data
chown -R 1964:1964 /DockerVol/grafana/data
chown -R 1964:1964 /DockerVol/alertmanager/data
chown -R 1964:1964 /DockerVol/blackbox/config

Environment Variables

# generate: openssl rand -hex 32
GF_SECURITY_ADMIN_PASSWORD=F@lcon13
GF_SECURITY_ADMIN_USER=admin
GF_USERS_DEFAULT_THEME=dark
GF_SERVER_ROOT_URL=https://grafana.netgrimoire.com
GF_FEATURE_TOGGLES_ENABLE=publicDashboards

Deploy

cd services/swarm/stack/monitoring
set -a && source .env && set +a
docker stack config --compose-file monitoring-stack.yml > resolved.yml
docker stack deploy --compose-file resolved.yml monitoring
rm resolved.yml
docker stack services monitoring

First Run

Perform the following steps after deploying the stack:

# Initial setup for Prometheus, Grafana, and Alertmanager
prometheus --config.file=/etc/prometheus/prometheus.yml --web.enable-lifecycle & 
grafana-server --no-auth --http-address=0.0.0.0:3000 &
alertmanager --config.file=/etc/alertmanager/alertmanager.yml --storage.path=/alertmanager &

User Guide

Accessing monitoring

Service URL Purpose

Primary Use Cases

Configure Prometheus, Grafana, and Alertmanager to collect metrics from services in NetGrimoire.

NetGrimoire Integrations

Integrate this monitoring stack with other NetGrimoire components using environment variables, such as GF_SERVER_ROOT_URL.


Operations

Monitoring

docker stack services monitoring
# Monitor Prometheus for errors and performance issues

Backups

Critical: Backup Prometheus, Grafana, Alertmanager, Blackbox Exporter, and Cadvisor databases. Reconstructable: Volume data can be restored.

Restore

cd services/swarm/stack/monitoring
./deploy.sh

Common Failures

Failure Symptoms Cause Fix
  • Prometheus not collecting metrics | Prometheus UI displays error messages. | Insufficient disk space or permissions to read metrics files. | Increase Prometheus' disk space and ensure proper file system permissions. |
  • Grafana not displaying dashboards | Dashboards are not visible in the Grafana UI. | No connections made between Grafana instances. | Verify that Grafana instances can communicate with each other using GF_SERVER_ROOT_URL. |

Changelog

Date Commit Summary
2026-04-11 ce875510 Initial documentation for the monitoring stack in NetGrimoire.
2026-04-11 3456a528 Updated Prometheus configuration to use --web.enable-lifecycle.
2026-04-09 8ca119ab Added support for Cadvisor services.
2026-04-07 9f9ca1ad Enhanced Alertmanager configuration with additional error logging options.
2026-04-07 71e3177f Updated Grafana to version 10.0.1 for improved performance and stability.

<Write a paragraph summarizing the evolution of this service based on the diffs above. If no diffs available, note that this is the initial documentation.>


Notes

  • Generated by Gremlin on 2026-04-12T01:10:17.109Z
  • Source: swarm/monitoring.yaml
  • Review User Guide and Changelog sections