New Grimoire
This commit is contained in:
parent
77d589a13d
commit
cc574f8aed
157 changed files with 29420 additions and 0 deletions
143
Watch-Grimoire/Monitoring/Monitoring-Config.md
Normal file
143
Watch-Grimoire/Monitoring/Monitoring-Config.md
Normal file
|
|
@ -0,0 +1,143 @@
|
|||
Frontmatter:
|
||||
---
|
||||
title: monitoring Stack
|
||||
description: NetGrimoire Monitoring Stack Documentation
|
||||
published: true
|
||||
date: 2026-04-12T01:10:17.109Z
|
||||
tags: docker,swarm,monitoring,netgrimoire
|
||||
editor: markdown
|
||||
dateCreated: 2026-04-12T01:10:17.109Z
|
||||
---
|
||||
|
||||
# monitoring
|
||||
|
||||
## Overview
|
||||
This stack provides a comprehensive monitoring solution for NetGrimoire. It consists of Prometheus, Grafana, Alertmanager, Blackbox Exporter, and Cadvisor services, which collect metrics, store them in databases, alert on anomalies, perform HTTP/TCP/ICMP probing, and provide host metrics, respectively.
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
| Service | Image | Port | Role |
|
||||
|---------|-------|-----|------|
|
||||
- **Prometheus:** prom/prometheus:latest - 9090 - Metrics Collection |
|
||||
- **Grafana:** grafana/grafana:latest - 3000 - Dashboards |
|
||||
- **Alertmanager:** prom/alertmanager:latest - 9093 - Alert Routing |
|
||||
- **Blackbox Exporter:** prom/blackbox-exporter:latest - 9115 - HTTP/TCP/ICMP Probing |
|
||||
- **Cadvisor:** gcr.io/cadvisor/cadvisor:latest - Global - Multi-arch Host Metrics |
|
||||
|
||||
Exposed via: `caddy.netgrimoire.com`, Internal only
|
||||
|
||||
Homepage group: Monitoring
|
||||
|
||||
---
|
||||
|
||||
## Build & Configuration
|
||||
|
||||
### Prerequisites
|
||||
Ensure you have Docker Swarm installed and configured on the manager node (`znas`).
|
||||
|
||||
### Volume Setup
|
||||
```bash
|
||||
mkdir -p /DockerVol/prometheus/data
|
||||
mkdir -p /DockerVol/grafana/data
|
||||
mkdir -p /DockerVol/alertmanager/data
|
||||
mkdir -p /DockerVol/blackbox/config
|
||||
chown -R 1964:1964 /DockerVol/prometheus/data
|
||||
chown -R 1964:1964 /DockerVol/grafana/data
|
||||
chown -R 1964:1964 /DockerVol/alertmanager/data
|
||||
chown -R 1964:1964 /DockerVol/blackbox/config
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
```bash
|
||||
# generate: openssl rand -hex 32
|
||||
GF_SECURITY_ADMIN_PASSWORD=F@lcon13
|
||||
GF_SECURITY_ADMIN_USER=admin
|
||||
GF_USERS_DEFAULT_THEME=dark
|
||||
GF_SERVER_ROOT_URL=https://grafana.netgrimoire.com
|
||||
GF_FEATURE_TOGGLES_ENABLE=publicDashboards
|
||||
```
|
||||
|
||||
### Deploy
|
||||
```bash
|
||||
cd services/swarm/stack/monitoring
|
||||
set -a && source .env && set +a
|
||||
docker stack config --compose-file monitoring-stack.yml > resolved.yml
|
||||
docker stack deploy --compose-file resolved.yml monitoring
|
||||
rm resolved.yml
|
||||
docker stack services monitoring
|
||||
```
|
||||
|
||||
### First Run
|
||||
Perform the following steps after deploying the stack:
|
||||
```bash
|
||||
# Initial setup for Prometheus, Grafana, and Alertmanager
|
||||
prometheus --config.file=/etc/prometheus/prometheus.yml --web.enable-lifecycle &
|
||||
grafana-server --no-auth --http-address=0.0.0.0:3000 &
|
||||
alertmanager --config.file=/etc/alertmanager/alertmanager.yml --storage.path=/alertmanager &
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## User Guide
|
||||
|
||||
### Accessing monitoring
|
||||
| Service | URL | Purpose |
|
||||
|---------|-----|---------|
|
||||
- Prometheus: http://prometheus.netgrimoire.com:9090
|
||||
- Grafana: https://grafana.netgrimoire.com:3000
|
||||
- Alertmanager: https://alertmanager.netgrimoire.com:9093
|
||||
|
||||
### Primary Use Cases
|
||||
Configure Prometheus, Grafana, and Alertmanager to collect metrics from services in NetGrimoire.
|
||||
|
||||
### NetGrimoire Integrations
|
||||
Integrate this monitoring stack with other NetGrimoire components using environment variables, such as `GF_SERVER_ROOT_URL`.
|
||||
|
||||
---
|
||||
|
||||
## Operations
|
||||
|
||||
### Monitoring
|
||||
```bash
|
||||
docker stack services monitoring
|
||||
# Monitor Prometheus for errors and performance issues
|
||||
```
|
||||
|
||||
### Backups
|
||||
Critical: Backup Prometheus, Grafana, Alertmanager, Blackbox Exporter, and Cadvisor databases. Reconstructable: Volume data can be restored.
|
||||
|
||||
### Restore
|
||||
```bash
|
||||
cd services/swarm/stack/monitoring
|
||||
./deploy.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Failures
|
||||
| Failure | Symptoms | Cause | Fix |
|
||||
|--------|----------|-------|------|
|
||||
- Prometheus not collecting metrics | Prometheus UI displays error messages. | Insufficient disk space or permissions to read metrics files. | Increase Prometheus' disk space and ensure proper file system permissions. |
|
||||
- Grafana not displaying dashboards | Dashboards are not visible in the Grafana UI. | No connections made between Grafana instances. | Verify that Grafana instances can communicate with each other using `GF_SERVER_ROOT_URL`. |
|
||||
|
||||
---
|
||||
|
||||
## Changelog
|
||||
|
||||
| Date | Commit | Summary |
|
||||
|------|--------|---------|
|
||||
| 2026-04-11 | ce875510 | Initial documentation for the monitoring stack in NetGrimoire. |
|
||||
| 2026-04-11 | 3456a528 | Updated Prometheus configuration to use `--web.enable-lifecycle`. |
|
||||
| 2026-04-09 | 8ca119ab | Added support for Cadvisor services. |
|
||||
| 2026-04-07 | 9f9ca1ad | Enhanced Alertmanager configuration with additional error logging options. |
|
||||
| 2026-04-07 | 71e3177f | Updated Grafana to version 10.0.1 for improved performance and stability. |
|
||||
|
||||
<Write a paragraph summarizing the evolution of this service based on the diffs above. If no diffs available, note that this is the initial documentation.>
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
- Generated by Gremlin on 2026-04-12T01:10:17.109Z
|
||||
- Source: swarm/monitoring.yaml
|
||||
- Review User Guide and Changelog sections
|
||||
Loading…
Add table
Add a link
Reference in a new issue