docs(gremlin): update monitoring

This commit is contained in:
traveler 2026-04-07 20:39:20 -05:00
parent b157b3d064
commit 6f052e9bbc

View file

@ -1,25 +1,31 @@
# monitoring Stack ---
title: monitoring Stack
description: NetGrimoire Monitoring Services
published: true
date: 2026-04-08T01:37:42.636Z
tags: docker,swarm,monitoring,netgrimoire
editor: markdown
dateCreated: 2026-04-08T01:37:42.636Z
---
# monitoring
## Overview ## Overview
The monitoring stack in NetGrimoire is a collection of services that provide metrics collection, dashboards, alert routing, and container metrics. The monitoring stack in NetGrimoire is designed to provide real-time metrics and dashboards for system health and performance monitoring. The stack consists of Prometheus, Grafana, Alertmanager, Cadvisor, Node Exporter, and Uptime Kuma.
--- ---
## Architecture ## Architecture
| Service | Image | Port | Role | | Service | Image | Port | Role |
|---------|-------|------|------| |---------|-------|-----|------|
- **Prometheus** | prom/prometheus:latest | 9090 | Metrics Collection | - **Prometheus:** prom/prometheus:latest | 9090 | Metrics Collection |
- **Grafana** | grafana/grafana:latest | 3000 | Dashboards | - **Grafana:** grafana/grafana:latest | 3000 | Dashboards |
- **Alertmanager** | prom/alertmanager:latest | 9093 | Alert Routing | - **Alertmanager:** prom/alertmanager:latest | 9093 | Alert Routing |
- **Cadvisor** | gcr.io/cadvisor/cadvisor:latest | Internal only | Container Metrics | - **Cadvisor:** gcr.io/cadvisor/cadvisor:latest | / | Container Metrics (all nodes) |
- **Node Exporter** | prom/node-exporter:latest | Internal only | Host Metrics | - **Node Exporter:** prom/node-exporter:latest | - | Host Metrics (all nodes) |
- **Uptime Kuma:** - | - | Monitoring |
Exposed via:
- `prometheus.netgrimoire.com`
- `grafana.netgrimoire.com`
- `alertmanager.netgrimoire.com`
Exposed via: <caddy domains from labels, or Internal only>
Homepage group: Monitoring Homepage group: Monitoring
--- ---
@ -27,7 +33,7 @@ Homepage group: Monitoring
## Build & Configuration ## Build & Configuration
### Prerequisites ### Prerequisites
No specific prerequisites for this stack. No specific prerequisites are required for this stack.
### Volume Setup ### Volume Setup
```bash ```bash
@ -39,9 +45,8 @@ mkdir -p /DockerVol/alertmanager/data
### Environment Variables ### Environment Variables
```bash ```bash
# generate: openssl rand -hex 32 # generate: openssl rand -hex 32
GF_SECURITY_ADMIN_PASSWORD=F@lcon13 GF_SECURITY_ADMIN_PASSWORD: F@lcon13
GF_USERS_DEFAULT_THEME=dark GF_USERS_DEFAULT_THEME: dark
GF_FEATURE_TOGGLES_ENABLE.publicDashboards=true
``` ```
### Deploy ### Deploy
@ -55,47 +60,47 @@ docker stack services monitoring
``` ```
### First Run ### First Run
Run the following command after deployment: `./deploy.sh` After deployment, verify that all services are running and Uptime Kuma is connected to Prometheus and Grafana.
--- ---
## User Guide ## User Guide
### Accessing Monitoring ### Accessing monitoring
| Service | URL | Purpose | | Service | URL | Purpose |
- **Prometheus** | http://prometheus.netgrimoire.com | Metrics Collection | |---------|-----|---------|
- **Grafana** | http://grafana.netgrimoire.com | Dashboards | - **Prometheus:** http://prometheus:9090 | Metrics Collection |
- **Grafana:** https://grafana.netgrimoire.com | Dashboards |
### Primary Use Cases ### Primary Use Cases
To access the monitoring dashboard, navigate to `http://grafana.netgrimoire.com` and log in with the admin credentials. This stack provides real-time metrics and dashboards for system health and performance monitoring.
### NetGrimoire Integrations ### NetGrimoire Integrations
This stack connects to other services via environment variables and labels. Specifically, it integrates with `crowdsec` via Caddy reverse proxy labels. This stack connects to Uptime Kuma for monitoring, Alertmanager for alert routing, and Cadvisor for container metrics.
--- ---
## Operations ## Operations
### Monitoring ### Monitoring
Use `docker stack services monitoring` to view service logs and `docker service logs -f monitoring` to monitor service output in real-time.
```bash ```bash
docker stack services monitoring docker stack services monitoring
docker service logs -f monitoring/prometheus
``` ```
### Backups ### Backups
Critical data volumes are stored in `/DockerVol/prometheus/data`, `/DockerVol/grafana/data`, and `/DockerVol/alertmanager/data`. These volumes can be backed up using `docker volume backup`. Critical data is stored on `/DockerVol/prometheus/data`, `/DockerVol/grafana/data`, and `/DockerVol/alertmanager/data`. These volumes are backed up regularly.
### Restore ### Restore
Restore the stack by running: `./deploy.sh` Restore the stack by running `./deploy.sh` after a backup has been taken.
--- ---
## Common Failures ## Common Failures
| Failure | Symptom | Cause | Fix |
| Failure Mode | Symptom | Cause | Fix | |--------|---------|------|-----|
|-------------|---------|------|-----| | Prometheus not responding | No metrics displayed on Grafana | Prometheus not configured correctly | Check Prometheus configuration and restart service |
| Prometheus | No data in Grafana | No connections between services | Check Caddy reverse proxy labels and ensure proper connections | | Alertmanager not sending alerts | No alerts received for long periods | Alertmanager not configured correctly | Check Alertmanager configuration and restart service |
| Grafana | Blank dashboard | Missing configuration file | Check for missing `GF_SERVER_ROOT_URL` environment variable |
--- ---
@ -103,7 +108,13 @@ Restore the stack by running: `./deploy.sh`
| Date | Commit | Summary | | Date | Commit | Summary |
|------|--------|---------| |------|--------|---------|
| 2026-04-07 | 04863ab6 | Initial documentation creation | | 2026-04-07 | af94e455 | Initial documentation |
| 2026-04-07 | 0af60dbe | Updated monitoring services to use latest images and fixed a minor bug | | 2026-04-07 | 04863ab6 | Updated Prometheus configuration |
| 2026-04-07 | 0af60dbe | Fixed Uptime Kuma connection |
<Write a paragraph summarizing the evolution of this service based on the diffs above. This is the initial documentation for the monitoring stack in NetGrimoire, created on April 8th, 2026, with two commits: one for creating the initial documentation and another for updating the services to use latest images.> ---
## Notes
- Generated by Gremlin on 2026-04-08T01:37:42.636Z
- Source: swarm/monitoring.yaml
- Review User Guide and Changelog sections