Setting Up Server Monitoring with Grafana and Prometheus
Prometheus collects metrics, Grafana visualizes them. Prometheus uses a pull model: it queries exporters on a schedule. Alertmanager sends notifications to Slack, PagerDuty, email when alerts fire.
Component Stack
[Servers] → [Node Exporter] ←── [Prometheus] ←── [Alertmanager] → [Slack/PagerDuty]
[PHP-FPM] → [php-fpm_exporter] ↓
[Nginx] → [nginx-vts-exporter] [Grafana]
[Redis] → [redis_exporter]
[Postgres]→ [postgres_exporter]
Docker Compose: Complete Stack
# docker-compose.monitoring.yml
services:
prometheus:
image: prom/prometheus:v2.50.1
volumes:
- ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml
- ./monitoring/alerts:/etc/prometheus/alerts
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.retention.time=30d'
- '--storage.tsdb.retention.size=20GB'
- '--web.enable-lifecycle'
ports:
- "9090:9090"
alertmanager:
image: prom/alertmanager:v0.27.0
volumes:
- ./monitoring/alertmanager.yml:/etc/alertmanager/alertmanager.yml
ports:
- "9093:9093"
grafana:
image: grafana/grafana:10.3.0
environment:
GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}
GF_SERVER_ROOT_URL: https://grafana.example.com
GF_SMTP_ENABLED: "true"
GF_SMTP_HOST: smtp.example.com:587
volumes:
- grafana_data:/var/lib/grafana
- ./monitoring/grafana/dashboards:/etc/grafana/provisioning/dashboards
- ./monitoring/grafana/datasources:/etc/grafana/provisioning/datasources
ports:
- "3000:3000"
node-exporter:
image: prom/node-exporter:v1.7.0
command:
- '--path.rootfs=/host'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
volumes:
- /:/host:ro,rslave
pid: host
network_mode: host
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.49.1
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
ports:
- "8080:8080"
volumes:
prometheus_data:
grafana_data:
prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: production
region: eu-west-1
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
rule_files:
- /etc/prometheus/alerts/*.yml
scrape_configs:
- job_name: node
static_configs:
- targets:
- web01:9100
- web02:9100
- db01:9100
relabel_configs:
- source_labels: [__address__]
target_label: instance
- job_name: php-fpm
static_configs:
- targets: ['web01:9253', 'web02:9253']
- job_name: nginx
static_configs:
- targets: ['web01:9913', 'web02:9913']
- job_name: redis
static_configs:
- targets: ['redis:9121']
- job_name: postgres
static_configs:
- targets: ['db01:9187']
- job_name: myapp
metrics_path: /metrics
bearer_token: ${METRICS_TOKEN}
static_configs:
- targets: ['web01:8080', 'web02:8080']
Alert Rules
# monitoring/alerts/servers.yml
groups:
- name: server.alerts
rules:
- alert: HighCPU
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU load on {{ $labels.instance }}"
description: "CPU: {{ $value | printf \"%.1f\" }}%"
- alert: LowMemory
expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 < 10
for: 2m
labels:
severity: critical
annotations:
summary: "Critically low memory on {{ $labels.instance }}"
description: "Free: {{ $value | printf \"%.1f\" }}%"
- alert: DiskSpaceLow
expr: (node_filesystem_avail_bytes{fstype!~"tmpfs|fuse.lxcfs"} / node_filesystem_size_bytes) * 100 < 15
for: 5m
labels:
severity: warning
annotations:
summary: "Low disk space {{ $labels.instance }}:{{ $labels.mountpoint }}"
- alert: HighPhpFpmQueue
expr: phpfpm_listen_queue > 10
for: 1m
labels:
severity: warning
annotations:
summary: "PHP-FPM queue full: {{ $value }} requests"
- alert: PostgresDown
expr: pg_up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "PostgreSQL unreachable on {{ $labels.instance }}"
- alert: SlowQueries
expr: rate(pg_stat_activity_max_tx_duration{state="active"}[5m]) > 30
for: 2m
labels:
severity: warning
annotations:
summary: "Slow PostgreSQL queries (>30 sec)"
Alertmanager: Notification Routing
# monitoring/alertmanager.yml
global:
slack_api_url: ${SLACK_WEBHOOK_URL}
route:
receiver: slack-notifications
group_by: ['alertname', 'instance']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
routes:
- match:
severity: critical
receiver: pagerduty-critical
continue: true
- match:
severity: critical
receiver: slack-critical
receivers:
- name: slack-notifications
slack_configs:
- channel: '#monitoring'
title: '{{ .GroupLabels.alertname }}'
text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
color: '{{ if eq .Status "firing" }}danger{{ else }}good{{ end }}'
- name: slack-critical
slack_configs:
- channel: '#incidents'
send_resolved: true
- name: pagerduty-critical
pagerduty_configs:
- routing_key: ${PAGERDUTY_KEY}
description: '{{ .GroupLabels.alertname }}: {{ .CommonAnnotations.summary }}'
Laravel: Custom Application Metrics
use Prometheus\CollectorRegistry;
use Prometheus\RenderTextFormat;
class MetricsController extends Controller
{
public function __invoke(CollectorRegistry $registry): Response
{
// Laravel metrics
$registry->getOrRegisterGauge('myapp', 'queue_size', 'Queue jobs count', ['queue'])
->set(Queue::size('emails'), ['emails']);
$registry->getOrRegisterGauge('myapp', 'active_users', 'Active users in last 5 min')
->set(User::where('last_seen_at', '>', now()->subMinutes(5))->count());
$registry->getOrRegisterGauge('myapp', 'failed_jobs', 'Failed jobs total')
->set(DB::table('failed_jobs')->count());
$renderer = new RenderTextFormat();
return response($renderer->render($registry->getMetricFamilySamples()), 200)
->header('Content-Type', RenderTextFormat::MIME_TYPE);
}
}
Grafana: Importing Dashboards
Pre-built dashboards from Grafana.com:
- Node Exporter Full (ID: 1860) — server metrics
- PHP-FPM (ID: 4912) — queue, workers
- PostgreSQL (ID: 9628) — queries, indexes, transactions
- Redis (ID: 11835) — memory, commands, eviction
# Automatic import via provisioning
# monitoring/grafana/dashboards/dashboards.yml
apiVersion: 1
providers:
- name: default
folder: ''
type: file
options:
path: /etc/grafana/provisioning/dashboards
Implementation Timeline
| Task | Time |
|---|---|
| Prometheus + Node Exporter + Grafana | 1–2 days |
| Alertmanager + Slack/PagerDuty | +1 day |
| PHP-FPM, Nginx, Redis, PostgreSQL exporters | +1–2 days |
| Custom application metrics | +1–2 days |
| Complete production stack with dashboards | 4–6 days |







