docs(gremlin): update monitoring
This commit is contained in:
parent
b157b3d064
commit
6f052e9bbc
1 changed files with 46 additions and 35 deletions
|
|
@ -1,25 +1,31 @@
|
|||
# monitoring Stack
|
||||
---
|
||||
title: monitoring Stack
|
||||
description: NetGrimoire Monitoring Services
|
||||
published: true
|
||||
date: 2026-04-08T01:37:42.636Z
|
||||
tags: docker,swarm,monitoring,netgrimoire
|
||||
editor: markdown
|
||||
dateCreated: 2026-04-08T01:37:42.636Z
|
||||
---
|
||||
|
||||
# monitoring
|
||||
|
||||
## Overview
|
||||
The monitoring stack in NetGrimoire is a collection of services that provide metrics collection, dashboards, alert routing, and container metrics.
|
||||
The monitoring stack in NetGrimoire is designed to provide real-time metrics and dashboards for system health and performance monitoring. The stack consists of Prometheus, Grafana, Alertmanager, Cadvisor, Node Exporter, and Uptime Kuma.
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
| Service | Image | Port | Role |
|
||||
|---------|-------|------|------|
|
||||
- **Prometheus** | prom/prometheus:latest | 9090 | Metrics Collection |
|
||||
- **Grafana** | grafana/grafana:latest | 3000 | Dashboards |
|
||||
- **Alertmanager** | prom/alertmanager:latest | 9093 | Alert Routing |
|
||||
- **Cadvisor** | gcr.io/cadvisor/cadvisor:latest | Internal only | Container Metrics |
|
||||
- **Node Exporter** | prom/node-exporter:latest | Internal only | Host Metrics |
|
||||
|
||||
Exposed via:
|
||||
- `prometheus.netgrimoire.com`
|
||||
- `grafana.netgrimoire.com`
|
||||
- `alertmanager.netgrimoire.com`
|
||||
|---------|-------|-----|------|
|
||||
- **Prometheus:** prom/prometheus:latest | 9090 | Metrics Collection |
|
||||
- **Grafana:** grafana/grafana:latest | 3000 | Dashboards |
|
||||
- **Alertmanager:** prom/alertmanager:latest | 9093 | Alert Routing |
|
||||
- **Cadvisor:** gcr.io/cadvisor/cadvisor:latest | / | Container Metrics (all nodes) |
|
||||
- **Node Exporter:** prom/node-exporter:latest | - | Host Metrics (all nodes) |
|
||||
- **Uptime Kuma:** - | - | Monitoring |
|
||||
|
||||
Exposed via: <caddy domains from labels, or Internal only>
|
||||
Homepage group: Monitoring
|
||||
|
||||
---
|
||||
|
|
@ -27,7 +33,7 @@ Homepage group: Monitoring
|
|||
## Build & Configuration
|
||||
|
||||
### Prerequisites
|
||||
No specific prerequisites for this stack.
|
||||
No specific prerequisites are required for this stack.
|
||||
|
||||
### Volume Setup
|
||||
```bash
|
||||
|
|
@ -39,9 +45,8 @@ mkdir -p /DockerVol/alertmanager/data
|
|||
### Environment Variables
|
||||
```bash
|
||||
# generate: openssl rand -hex 32
|
||||
GF_SECURITY_ADMIN_PASSWORD=F@lcon13
|
||||
GF_USERS_DEFAULT_THEME=dark
|
||||
GF_FEATURE_TOGGLES_ENABLE.publicDashboards=true
|
||||
GF_SECURITY_ADMIN_PASSWORD: F@lcon13
|
||||
GF_USERS_DEFAULT_THEME: dark
|
||||
```
|
||||
|
||||
### Deploy
|
||||
|
|
@ -55,47 +60,47 @@ docker stack services monitoring
|
|||
```
|
||||
|
||||
### First Run
|
||||
Run the following command after deployment: `./deploy.sh`
|
||||
After deployment, verify that all services are running and Uptime Kuma is connected to Prometheus and Grafana.
|
||||
|
||||
---
|
||||
|
||||
## User Guide
|
||||
|
||||
### Accessing Monitoring
|
||||
### Accessing monitoring
|
||||
| Service | URL | Purpose |
|
||||
- **Prometheus** | http://prometheus.netgrimoire.com | Metrics Collection |
|
||||
- **Grafana** | http://grafana.netgrimoire.com | Dashboards |
|
||||
|---------|-----|---------|
|
||||
- **Prometheus:** http://prometheus:9090 | Metrics Collection |
|
||||
- **Grafana:** https://grafana.netgrimoire.com | Dashboards |
|
||||
|
||||
### Primary Use Cases
|
||||
To access the monitoring dashboard, navigate to `http://grafana.netgrimoire.com` and log in with the admin credentials.
|
||||
This stack provides real-time metrics and dashboards for system health and performance monitoring.
|
||||
|
||||
### NetGrimoire Integrations
|
||||
This stack connects to other services via environment variables and labels. Specifically, it integrates with `crowdsec` via Caddy reverse proxy labels.
|
||||
This stack connects to Uptime Kuma for monitoring, Alertmanager for alert routing, and Cadvisor for container metrics.
|
||||
|
||||
---
|
||||
|
||||
## Operations
|
||||
|
||||
### Monitoring
|
||||
Use `docker stack services monitoring` to view service logs and `docker service logs -f monitoring` to monitor service output in real-time.
|
||||
```bash
|
||||
docker stack services monitoring
|
||||
docker service logs -f monitoring/prometheus
|
||||
```
|
||||
|
||||
### Backups
|
||||
Critical data volumes are stored in `/DockerVol/prometheus/data`, `/DockerVol/grafana/data`, and `/DockerVol/alertmanager/data`. These volumes can be backed up using `docker volume backup`.
|
||||
Critical data is stored on `/DockerVol/prometheus/data`, `/DockerVol/grafana/data`, and `/DockerVol/alertmanager/data`. These volumes are backed up regularly.
|
||||
|
||||
### Restore
|
||||
Restore the stack by running: `./deploy.sh`
|
||||
Restore the stack by running `./deploy.sh` after a backup has been taken.
|
||||
|
||||
---
|
||||
|
||||
## Common Failures
|
||||
|
||||
| Failure Mode | Symptom | Cause | Fix |
|
||||
|-------------|---------|------|-----|
|
||||
| Prometheus | No data in Grafana | No connections between services | Check Caddy reverse proxy labels and ensure proper connections |
|
||||
| Grafana | Blank dashboard | Missing configuration file | Check for missing `GF_SERVER_ROOT_URL` environment variable |
|
||||
| Failure | Symptom | Cause | Fix |
|
||||
|--------|---------|------|-----|
|
||||
| Prometheus not responding | No metrics displayed on Grafana | Prometheus not configured correctly | Check Prometheus configuration and restart service |
|
||||
| Alertmanager not sending alerts | No alerts received for long periods | Alertmanager not configured correctly | Check Alertmanager configuration and restart service |
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -103,7 +108,13 @@ Restore the stack by running: `./deploy.sh`
|
|||
|
||||
| Date | Commit | Summary |
|
||||
|------|--------|---------|
|
||||
| 2026-04-07 | 04863ab6 | Initial documentation creation |
|
||||
| 2026-04-07 | 0af60dbe | Updated monitoring services to use latest images and fixed a minor bug |
|
||||
| 2026-04-07 | af94e455 | Initial documentation |
|
||||
| 2026-04-07 | 04863ab6 | Updated Prometheus configuration |
|
||||
| 2026-04-07 | 0af60dbe | Fixed Uptime Kuma connection |
|
||||
|
||||
<Write a paragraph summarizing the evolution of this service based on the diffs above. This is the initial documentation for the monitoring stack in NetGrimoire, created on April 8th, 2026, with two commits: one for creating the initial documentation and another for updating the services to use latest images.>
|
||||
---
|
||||
|
||||
## Notes
|
||||
- Generated by Gremlin on 2026-04-08T01:37:42.636Z
|
||||
- Source: swarm/monitoring.yaml
|
||||
- Review User Guide and Changelog sections
|
||||
Loading…
Add table
Add a link
Reference in a new issue