Backend Monitoring Setup for Mobile Apps (Grafana/Prometheus)
When users say "the app doesn't work" and Crashlytics is silent — the problem is on the backend. Prometheus + Grafana provide visibility into what's happening with API servers, databases, and queues right now — and show degradation before it becomes an incident.
What We Collect and Why
For a mobile app backend, four groups of metrics are critical:
API metrics — latency, error rate, throughput per endpoint. Especially important are p95 and p99 latency: average values hide tail delays, and those specifically destroy mobile UX. A user with p99 at 8 seconds will leave even if the average is 200ms.
Database metrics — active connections, query duration, lock waits, replication lag. pg_stat_statements via postgres_exporter gives a slice of the slowest queries.
Infrastructure metrics — CPU, RAM, disk I/O, network saturation on each node.
Queue metrics — for apps with background processing: RabbitMQ/Kafka queue depth, consumer group lag, message processing time.
Instrumenting the API Server
Prometheus expects metrics in its format. For different languages — ready client libraries:
# Python (FastAPI / Flask)
from prometheus_fastapi_instrumentator import Instrumentator
app = FastAPI()
Instrumentator().instrument(app).expose(app)
# Endpoint /metrics appears automatically
// Go (Echo / Gin)
import "github.com/prometheus/client_golang/prometheus/promhttp"
func setupMetrics(e *echo.Echo) {
httpRequestsTotal := prometheus.NewCounterVec(
prometheus.CounterOpts{Name: "http_requests_total"},
[]string{"method", "path", "status"},
)
prometheus.MustRegister(httpRequestsTotal)
e.Use(func(next echo.HandlerFunc) echo.HandlerFunc {
return func(c echo.Context) error {
err := next(c)
httpRequestsTotal.WithLabelValues(
c.Request().Method, c.Path(),
strconv.Itoa(c.Response().Status),
).Inc()
return err
}
})
e.GET("/metrics", echo.WrapHandler(promhttp.Handler()))
}
Important: don't create a metric with path as a high-cardinality label — if the path contains user_id or other dynamic values, Prometheus will choke. Normalize the path: /users/12345/profile → /users/:id/profile.
Prometheus Configuration
Basic prometheus.yml for a mobile backend:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'api-server'
static_configs:
- targets: ['api:8080']
metrics_path: /metrics
- job_name: 'postgres'
static_configs:
- targets: ['postgres-exporter:9187']
- job_name: 'redis'
static_configs:
- targets: ['redis-exporter:9121']
- job_name: 'node'
static_configs:
- targets: ['node-exporter:9100']
For production — Service Discovery via Consul or Kubernetes service discovery instead of static_configs.
Grafana Dashboards
No need to build dashboards from scratch — Grafana.com/dashboards contains ready-made ones: ID 1860 for Node Exporter, ID 9628 for PostgreSQL via postgres_exporter. Import with one click.
For API monitoring, build a custom dashboard with key panels:
-
rate(http_requests_total[5m])— RPS per endpoint -
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))— p95 latency -
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])— error rate
Alerting via Alertmanager
Grafana Alerting or Alertmanager — set thresholds for PagerDuty/Telegram/Slack. Minimal alert set for mobile backend:
# alerting/rules.yml
groups:
- name: api
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05
for: 2m
labels:
severity: critical
annotations:
summary: "Error rate > 5% on {{ $labels.job }}"
- alert: HighLatency
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1
for: 5m
annotations:
summary: "p95 latency > 1s"
for: 2m — don't raise an alert on brief spikes, only on sustained degradation.
Scope of Work
- Docker Compose or Kubernetes manifests for Prometheus, Grafana, Alertmanager
- API server instrumentation (Python / Go / Node.js / Java)
- Connecting exporters: postgres_exporter, redis_exporter, node_exporter
- Custom Grafana dashboards tailored to app specifics
- Alert setup with routing to Telegram / Slack / PagerDuty
- Documentation on metrics and alert thresholds
Timeframe
Basic setup with ready dashboards and alerts: 2–3 days. Full stack with custom metrics, code instrumentation, and production-ready configuration: 4–6 days. Cost calculated individually.







