docs(gremlin): update monitoring

2026-04-07 20:39:20 -05:00 · 2026-04-07 20:39:20 -05:00 · 6f052e9bbc
commit 6f052e9bbc
parent b157b3d064
1 changed files with 46 additions and 35 deletions
--- a/Netgrimoire/Services/monitoring/monitoring.md
+++ b/Netgrimoire/Services/monitoring/monitoring.md
@ -1,25 +1,31 @@
-# monitoring Stack
+---
 title: monitoring Stack
 description: NetGrimoire Monitoring Services
 published: true
 date: 2026-04-08T01:37:42.636Z
 tags: docker,swarm,monitoring,netgrimoire
 editor: markdown
 dateCreated: 2026-04-08T01:37:42.636Z
 ---
 # monitoring
 ## Overview
-The monitoring stack in NetGrimoire is a collection of services that provide metrics collection, dashboards, alert routing, and container metrics.
+The monitoring stack in NetGrimoire is designed to provide real-time metrics and dashboards for system health and performance monitoring. The stack consists of Prometheus, Grafana, Alertmanager, Cadvisor, Node Exporter, and Uptime Kuma.
 ---
 ## Architecture
 | Service | Image | Port | Role |
-|---------|-------|------|------|
+|---------|-------|-----|------|
- **Prometheus** | prom/prometheus:latest | 9090 | Metrics Collection |
+- **Prometheus:** prom/prometheus:latest | 9090 | Metrics Collection |
- **Grafana** | grafana/grafana:latest | 3000 | Dashboards |
+- **Grafana:** grafana/grafana:latest | 3000 | Dashboards |
- **Alertmanager** | prom/alertmanager:latest | 9093 | Alert Routing |
+- **Alertmanager:** prom/alertmanager:latest | 9093 | Alert Routing |
- **Cadvisor** | gcr.io/cadvisor/cadvisor:latest | Internal only | Container Metrics |
+- **Cadvisor:** gcr.io/cadvisor/cadvisor:latest | / | Container Metrics (all nodes) |
- **Node Exporter** | prom/node-exporter:latest | Internal only | Host Metrics |
+- **Node Exporter:** prom/node-exporter:latest | - | Host Metrics (all nodes) |
-
+- **Uptime Kuma:** - | - | Monitoring |
 Exposed via:
 - `prometheus.netgrimoire.com`
 - `grafana.netgrimoire.com`
 - `alertmanager.netgrimoire.com`
 Exposed via: <caddy domains from labels, or Internal only>
 Homepage group: Monitoring
 ---
@ -27,7 +33,7 @@ Homepage group: Monitoring
 ## Build & Configuration
 ### Prerequisites
-No specific prerequisites for this stack.
+No specific prerequisites are required for this stack.
 ### Volume Setup
 ```bash
@ -39,9 +45,8 @@ mkdir -p /DockerVol/alertmanager/data
 ### Environment Variables
 ```bash
 # generate: openssl rand -hex 32
-GF_SECURITY_ADMIN_PASSWORD=F@lcon13
+GF_SECURITY_ADMIN_PASSWORD: F@lcon13
-GF_USERS_DEFAULT_THEME=dark
+GF_USERS_DEFAULT_THEME: dark
 GF_FEATURE_TOGGLES_ENABLE.publicDashboards=true
 ```
 ### Deploy
@ -55,47 +60,47 @@ docker stack services monitoring
 ```
 ### First Run
-Run the following command after deployment: `./deploy.sh`
+After deployment, verify that all services are running and Uptime Kuma is connected to Prometheus and Grafana.
 ---
 ## User Guide
-### Accessing Monitoring
+### Accessing monitoring
 | Service | URL | Purpose |
- **Prometheus** | http://prometheus.netgrimoire.com | Metrics Collection |
+|---------|-----|---------|
- **Grafana** | http://grafana.netgrimoire.com | Dashboards |
+- **Prometheus:** http://prometheus:9090 | Metrics Collection |
 - **Grafana:** https://grafana.netgrimoire.com | Dashboards |
 ### Primary Use Cases
-To access the monitoring dashboard, navigate to `http://grafana.netgrimoire.com` and log in with the admin credentials.
+This stack provides real-time metrics and dashboards for system health and performance monitoring.
 ### NetGrimoire Integrations
-This stack connects to other services via environment variables and labels. Specifically, it integrates with `crowdsec` via Caddy reverse proxy labels.
+This stack connects to Uptime Kuma for monitoring, Alertmanager for alert routing, and Cadvisor for container metrics.
 ---
 ## Operations
 ### Monitoring
 Use `docker stack services monitoring` to view service logs and `docker service logs -f monitoring` to monitor service output in real-time.
 ```bash
 docker stack services monitoring
 docker service logs -f monitoring/prometheus
 ```
 ### Backups
-Critical data volumes are stored in `/DockerVol/prometheus/data`, `/DockerVol/grafana/data`, and `/DockerVol/alertmanager/data`. These volumes can be backed up using `docker volume backup`.
+Critical data is stored on `/DockerVol/prometheus/data`, `/DockerVol/grafana/data`, and `/DockerVol/alertmanager/data`. These volumes are backed up regularly.
 ### Restore
-Restore the stack by running: `./deploy.sh`
+Restore the stack by running `./deploy.sh` after a backup has been taken.
 ---
 ## Common Failures
-
+| Failure | Symptom | Cause | Fix |
-| Failure Mode | Symptom | Cause | Fix |
+|--------|---------|------|-----|
-|-------------|---------|------|-----|
+| Prometheus not responding | No metrics displayed on Grafana | Prometheus not configured correctly | Check Prometheus configuration and restart service |
-| Prometheus | No data in Grafana | No connections between services | Check Caddy reverse proxy labels and ensure proper connections |
+| Alertmanager not sending alerts | No alerts received for long periods | Alertmanager not configured correctly | Check Alertmanager configuration and restart service |
 | Grafana | Blank dashboard | Missing configuration file | Check for missing `GF_SERVER_ROOT_URL` environment variable |
 ---
@ -103,7 +108,13 @@ Restore the stack by running: `./deploy.sh`
 | Date | Commit | Summary |
 |------|--------|---------|
-| 2026-04-07 | 04863ab6 | Initial documentation creation |
+| 2026-04-07 | af94e455 | Initial documentation |
-| 2026-04-07 | 0af60dbe | Updated monitoring services to use latest images and fixed a minor bug |
+| 2026-04-07 | 04863ab6 | Updated Prometheus configuration |
 | 2026-04-07 | 0af60dbe | Fixed Uptime Kuma connection |
-<Write a paragraph summarizing the evolution of this service based on the diffs above. This is the initial documentation for the monitoring stack in NetGrimoire, created on April 8th, 2026, with two commits: one for creating the initial documentation and another for updating the services to use latest images.>
+---
 ## Notes
 - Generated by Gremlin on 2026-04-08T01:37:42.636Z
 - Source: swarm/monitoring.yaml
 - Review User Guide and Changelog sections