docs(gremlin): update monitoring
This commit is contained in:
parent
fc68d883d6
commit
b157b3d064
1 changed files with 62 additions and 45 deletions
|
|
@ -1,58 +1,50 @@
|
||||||
# monitoring
|
# monitoring Stack
|
||||||
|
|
||||||
Overview
|
## Overview
|
||||||
---------------
|
The monitoring stack in NetGrimoire is a collection of services that provide metrics collection, dashboards, alert routing, and container metrics.
|
||||||
|
|
||||||
The monitoring stack provides a comprehensive set of services for metrics collection, dashboard management, alert routing, container metrics, and host metrics in NetGrimoire. The stack includes Prometheus for metrics collection, Grafana for dashboards, Alertmanager for alert routing, Cadvisor for container metrics, and Node Exporter for host metrics.
|
---
|
||||||
|
|
||||||
Architecture
|
## Architecture
|
||||||
-------------
|
|
||||||
|
|
||||||
| Service | Image | Port | Role |
|
| Service | Image | Port | Role |
|
||||||
|---------|-------|-----|------|
|
|---------|-------|------|------|
|
||||||
- **Prometheus:** prom/prometheus:latest
|
- **Prometheus** | prom/prometheus:latest | 9090 | Metrics Collection |
|
||||||
- exposed via: `grafana.netgrimoire.com`
|
- **Grafana** | grafana/grafana:latest | 3000 | Dashboards |
|
||||||
- Homepage group: Monitoring
|
- **Alertmanager** | prom/alertmanager:latest | 9093 | Alert Routing |
|
||||||
|
- **Cadvisor** | gcr.io/cadvisor/cadvisor:latest | Internal only | Container Metrics |
|
||||||
|
- **Node Exporter** | prom/node-exporter:latest | Internal only | Host Metrics |
|
||||||
|
|
||||||
- **Grafana:** grafana/grafana:latest
|
Exposed via:
|
||||||
- exposed via: `grafana.netgrimoire.com`
|
- `prometheus.netgrimoire.com`
|
||||||
- Homepage group: Monitoring
|
- `grafana.netgrimoire.com`
|
||||||
|
- `alertmanager.netgrimoire.com`
|
||||||
|
|
||||||
- **Alertmanager:** prom/alertmanager:latest
|
Homepage group: Monitoring
|
||||||
- exposed via: `alertmanager.netgrimoire.com`
|
|
||||||
- Homepage group: Monitoring
|
|
||||||
|
|
||||||
- **Cadvisor:** gcr.io/cadvisor/cadvisor:latest
|
---
|
||||||
- exposed via: `cadvisor.netgrimoire.com`
|
|
||||||
- Homepage group: Monitoring
|
|
||||||
|
|
||||||
- **Node Exporter:** prom/node-exporter:latest
|
## Build & Configuration
|
||||||
- exposed via: `node-exporter.netgrimoire.com`
|
|
||||||
- Homepage group: Monitoring
|
|
||||||
|
|
||||||
Build & Configuration
|
|
||||||
---------------------
|
|
||||||
|
|
||||||
### Prerequisites
|
### Prerequisites
|
||||||
|
No specific prerequisites for this stack.
|
||||||
- Docker and Docker Swarm installed on docker4
|
|
||||||
|
|
||||||
### Volume Setup
|
### Volume Setup
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
mkdir -p /DockerVol/prometheus/data
|
mkdir -p /DockerVol/prometheus/data
|
||||||
mkdir -p /DockerVol/grafana/data
|
mkdir -p /DockerVol/grafana/data
|
||||||
|
mkdir -p /DockerVol/alertmanager/data
|
||||||
```
|
```
|
||||||
|
|
||||||
### Environment Variables
|
### Environment Variables
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# generate: openssl rand -hex 32
|
# generate: openssl rand -hex 32
|
||||||
GF_SECURITY_ADMIN_PASSWORD=F@lcon13
|
GF_SECURITY_ADMIN_PASSWORD=F@lcon13
|
||||||
|
GF_USERS_DEFAULT_THEME=dark
|
||||||
|
GF_FEATURE_TOGGLES_ENABLE.publicDashboards=true
|
||||||
```
|
```
|
||||||
|
|
||||||
### Deploy
|
### Deploy
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cd services/swarm/stack/monitoring
|
cd services/swarm/stack/monitoring
|
||||||
set -a && source .env && set +a
|
set -a && source .env && set +a
|
||||||
|
|
@ -63,30 +55,55 @@ docker stack services monitoring
|
||||||
```
|
```
|
||||||
|
|
||||||
### First Run
|
### First Run
|
||||||
|
Run the following command after deployment: `./deploy.sh`
|
||||||
- Post-deploy steps specific to these services include configuring network, caddy, and uptime kuma.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## User Guide
|
## User Guide
|
||||||
|
|
||||||
### Accessing Monitoring
|
### Accessing Monitoring
|
||||||
|
|
||||||
| Service | URL | Purpose |
|
| Service | URL | Purpose |
|
||||||
|---------|-----|---------|
|
- **Prometheus** | http://prometheus.netgrimoire.com | Metrics Collection |
|
||||||
- **Prometheus:** https://prometheus.netgrimoire.com
|
- **Grafana** | http://grafana.netgrimoire.com | Dashboards |
|
||||||
- **Grafana:** https://grafana.netgrimoire.com
|
|
||||||
- **Alertmanager:** https://alertmanager.netgrimoire.com
|
|
||||||
- **Cadvisor:** `cadvisor.netgrimoire.com` (Container metrics)
|
|
||||||
- **Node Exporter:** `node-exporter.netgrimoire.com` (Host metrics)
|
|
||||||
|
|
||||||
### Primary Use Cases
|
### Primary Use Cases
|
||||||
|
To access the monitoring dashboard, navigate to `http://grafana.netgrimoire.com` and log in with the admin credentials.
|
||||||
- Monitoring system performance and health.
|
|
||||||
- Configuring alerts for critical issues.
|
|
||||||
- Visualizing metrics in real-time.
|
|
||||||
|
|
||||||
### NetGrimoire Integrations
|
### NetGrimoire Integrations
|
||||||
|
This stack connects to other services via environment variables and labels. Specifically, it integrates with `crowdsec` via Caddy reverse proxy labels.
|
||||||
|
|
||||||
- Connects to Crowdsec via Caddy reverse proxy.
|
---
|
||||||
- Uptime Kuma monitors services and detects errors.
|
|
||||||
|
## Operations
|
||||||
|
|
||||||
|
### Monitoring
|
||||||
|
```bash
|
||||||
|
docker stack services monitoring
|
||||||
|
docker service logs -f monitoring/prometheus
|
||||||
|
```
|
||||||
|
|
||||||
|
### Backups
|
||||||
|
Critical data volumes are stored in `/DockerVol/prometheus/data`, `/DockerVol/grafana/data`, and `/DockerVol/alertmanager/data`. These volumes can be backed up using `docker volume backup`.
|
||||||
|
|
||||||
|
### Restore
|
||||||
|
Restore the stack by running: `./deploy.sh`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Common Failures
|
||||||
|
|
||||||
|
| Failure Mode | Symptom | Cause | Fix |
|
||||||
|
|-------------|---------|------|-----|
|
||||||
|
| Prometheus | No data in Grafana | No connections between services | Check Caddy reverse proxy labels and ensure proper connections |
|
||||||
|
| Grafana | Blank dashboard | Missing configuration file | Check for missing `GF_SERVER_ROOT_URL` environment variable |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Changelog
|
||||||
|
|
||||||
|
| Date | Commit | Summary |
|
||||||
|
|------|--------|---------|
|
||||||
|
| 2026-04-07 | 04863ab6 | Initial documentation creation |
|
||||||
|
| 2026-04-07 | 0af60dbe | Updated monitoring services to use latest images and fixed a minor bug |
|
||||||
|
|
||||||
|
<Write a paragraph summarizing the evolution of this service based on the diffs above. This is the initial documentation for the monitoring stack in NetGrimoire, created on April 8th, 2026, with two commits: one for creating the initial documentation and another for updating the services to use latest images.>
|
||||||
Loading…
Add table
Add a link
Reference in a new issue