docs(gremlin): update monitoring

This commit is contained in:
traveler 2026-04-07 20:39:20 -05:00
parent b157b3d064
commit 6f052e9bbc

View file

@ -1,25 +1,31 @@
# monitoring Stack
---
title: monitoring Stack
description: NetGrimoire Monitoring Services
published: true
date: 2026-04-08T01:37:42.636Z
tags: docker,swarm,monitoring,netgrimoire
editor: markdown
dateCreated: 2026-04-08T01:37:42.636Z
---
# monitoring
## Overview
The monitoring stack in NetGrimoire is a collection of services that provide metrics collection, dashboards, alert routing, and container metrics.
The monitoring stack in NetGrimoire is designed to provide real-time metrics and dashboards for system health and performance monitoring. The stack consists of Prometheus, Grafana, Alertmanager, Cadvisor, Node Exporter, and Uptime Kuma.
---
## Architecture
| Service | Image | Port | Role |
|---------|-------|------|------|
- **Prometheus** | prom/prometheus:latest | 9090 | Metrics Collection |
- **Grafana** | grafana/grafana:latest | 3000 | Dashboards |
- **Alertmanager** | prom/alertmanager:latest | 9093 | Alert Routing |
- **Cadvisor** | gcr.io/cadvisor/cadvisor:latest | Internal only | Container Metrics |
- **Node Exporter** | prom/node-exporter:latest | Internal only | Host Metrics |
Exposed via:
- `prometheus.netgrimoire.com`
- `grafana.netgrimoire.com`
- `alertmanager.netgrimoire.com`
|---------|-------|-----|------|
- **Prometheus:** prom/prometheus:latest | 9090 | Metrics Collection |
- **Grafana:** grafana/grafana:latest | 3000 | Dashboards |
- **Alertmanager:** prom/alertmanager:latest | 9093 | Alert Routing |
- **Cadvisor:** gcr.io/cadvisor/cadvisor:latest | / | Container Metrics (all nodes) |
- **Node Exporter:** prom/node-exporter:latest | - | Host Metrics (all nodes) |
- **Uptime Kuma:** - | - | Monitoring |
Exposed via: <caddy domains from labels, or Internal only>
Homepage group: Monitoring
---
@ -27,7 +33,7 @@ Homepage group: Monitoring
## Build & Configuration
### Prerequisites
No specific prerequisites for this stack.
No specific prerequisites are required for this stack.
### Volume Setup
```bash
@ -39,9 +45,8 @@ mkdir -p /DockerVol/alertmanager/data
### Environment Variables
```bash
# generate: openssl rand -hex 32
GF_SECURITY_ADMIN_PASSWORD=F@lcon13
GF_USERS_DEFAULT_THEME=dark
GF_FEATURE_TOGGLES_ENABLE.publicDashboards=true
GF_SECURITY_ADMIN_PASSWORD: F@lcon13
GF_USERS_DEFAULT_THEME: dark
```
### Deploy
@ -55,47 +60,47 @@ docker stack services monitoring
```
### First Run
Run the following command after deployment: `./deploy.sh`
After deployment, verify that all services are running and Uptime Kuma is connected to Prometheus and Grafana.
---
## User Guide
### Accessing Monitoring
### Accessing monitoring
| Service | URL | Purpose |
- **Prometheus** | http://prometheus.netgrimoire.com | Metrics Collection |
- **Grafana** | http://grafana.netgrimoire.com | Dashboards |
|---------|-----|---------|
- **Prometheus:** http://prometheus:9090 | Metrics Collection |
- **Grafana:** https://grafana.netgrimoire.com | Dashboards |
### Primary Use Cases
To access the monitoring dashboard, navigate to `http://grafana.netgrimoire.com` and log in with the admin credentials.
This stack provides real-time metrics and dashboards for system health and performance monitoring.
### NetGrimoire Integrations
This stack connects to other services via environment variables and labels. Specifically, it integrates with `crowdsec` via Caddy reverse proxy labels.
This stack connects to Uptime Kuma for monitoring, Alertmanager for alert routing, and Cadvisor for container metrics.
---
## Operations
### Monitoring
Use `docker stack services monitoring` to view service logs and `docker service logs -f monitoring` to monitor service output in real-time.
```bash
docker stack services monitoring
docker service logs -f monitoring/prometheus
```
### Backups
Critical data volumes are stored in `/DockerVol/prometheus/data`, `/DockerVol/grafana/data`, and `/DockerVol/alertmanager/data`. These volumes can be backed up using `docker volume backup`.
Critical data is stored on `/DockerVol/prometheus/data`, `/DockerVol/grafana/data`, and `/DockerVol/alertmanager/data`. These volumes are backed up regularly.
### Restore
Restore the stack by running: `./deploy.sh`
Restore the stack by running `./deploy.sh` after a backup has been taken.
---
## Common Failures
| Failure Mode | Symptom | Cause | Fix |
|-------------|---------|------|-----|
| Prometheus | No data in Grafana | No connections between services | Check Caddy reverse proxy labels and ensure proper connections |
| Grafana | Blank dashboard | Missing configuration file | Check for missing `GF_SERVER_ROOT_URL` environment variable |
| Failure | Symptom | Cause | Fix |
|--------|---------|------|-----|
| Prometheus not responding | No metrics displayed on Grafana | Prometheus not configured correctly | Check Prometheus configuration and restart service |
| Alertmanager not sending alerts | No alerts received for long periods | Alertmanager not configured correctly | Check Alertmanager configuration and restart service |
---
@ -103,7 +108,13 @@ Restore the stack by running: `./deploy.sh`
| Date | Commit | Summary |
|------|--------|---------|
| 2026-04-07 | 04863ab6 | Initial documentation creation |
| 2026-04-07 | 0af60dbe | Updated monitoring services to use latest images and fixed a minor bug |
| 2026-04-07 | af94e455 | Initial documentation |
| 2026-04-07 | 04863ab6 | Updated Prometheus configuration |
| 2026-04-07 | 0af60dbe | Fixed Uptime Kuma connection |
<Write a paragraph summarizing the evolution of this service based on the diffs above. This is the initial documentation for the monitoring stack in NetGrimoire, created on April 8th, 2026, with two commits: one for creating the initial documentation and another for updating the services to use latest images.>
---
## Notes
- Generated by Gremlin on 2026-04-08T01:37:42.636Z
- Source: swarm/monitoring.yaml
- Review User Guide and Changelog sections