docs(gremlin): create monitoring

This commit is contained in:
traveler 2026-04-07 17:13:48 -05:00
parent e6a42190d2
commit fc68d883d6

View file

@ -0,0 +1,92 @@
# monitoring
Overview
---------------
The monitoring stack provides a comprehensive set of services for metrics collection, dashboard management, alert routing, container metrics, and host metrics in NetGrimoire. The stack includes Prometheus for metrics collection, Grafana for dashboards, Alertmanager for alert routing, Cadvisor for container metrics, and Node Exporter for host metrics.
Architecture
-------------
| Service | Image | Port | Role |
|---------|-------|-----|------|
- **Prometheus:** prom/prometheus:latest
- exposed via: `grafana.netgrimoire.com`
- Homepage group: Monitoring
- **Grafana:** grafana/grafana:latest
- exposed via: `grafana.netgrimoire.com`
- Homepage group: Monitoring
- **Alertmanager:** prom/alertmanager:latest
- exposed via: `alertmanager.netgrimoire.com`
- Homepage group: Monitoring
- **Cadvisor:** gcr.io/cadvisor/cadvisor:latest
- exposed via: `cadvisor.netgrimoire.com`
- Homepage group: Monitoring
- **Node Exporter:** prom/node-exporter:latest
- exposed via: `node-exporter.netgrimoire.com`
- Homepage group: Monitoring
Build & Configuration
---------------------
### Prerequisites
- Docker and Docker Swarm installed on docker4
### Volume Setup
```bash
mkdir -p /DockerVol/prometheus/data
mkdir -p /DockerVol/grafana/data
```
### Environment Variables
```bash
# generate: openssl rand -hex 32
GF_SECURITY_ADMIN_PASSWORD=F@lcon13
```
### Deploy
```bash
cd services/swarm/stack/monitoring
set -a && source .env && set +a
docker stack config --compose-file monitoring-stack.yml > resolved.yml
docker stack deploy --compose-file resolved.yml monitoring
rm resolved.yml
docker stack services monitoring
```
### First Run
- Post-deploy steps specific to these services include configuring network, caddy, and uptime kuma.
---
## User Guide
### Accessing Monitoring
| Service | URL | Purpose |
|---------|-----|---------|
- **Prometheus:** https://prometheus.netgrimoire.com
- **Grafana:** https://grafana.netgrimoire.com
- **Alertmanager:** https://alertmanager.netgrimoire.com
- **Cadvisor:** `cadvisor.netgrimoire.com` (Container metrics)
- **Node Exporter:** `node-exporter.netgrimoire.com` (Host metrics)
### Primary Use Cases
- Monitoring system performance and health.
- Configuring alerts for critical issues.
- Visualizing metrics in real-time.
### NetGrimoire Integrations
- Connects to Crowdsec via Caddy reverse proxy.
- Uptime Kuma monitors services and detects errors.