Custom infrastructure monitoring dashboards with Grafana

Our company is engaged in the development, support and maintenance of sites of any complexity. From simple one-page sites to large-scale cluster systems built on micro services. Experience of developers is confirmed by certificates from vendors.
Development and maintenance of all types of websites:
Informational websites or web applications
Business card websites, landing pages, corporate websites, online catalogs, quizzes, promo websites, blogs, news resources, informational portals, forums, aggregators
E-commerce websites or web applications
Online stores, B2B portals, marketplaces, online exchanges, cashback websites, exchanges, dropshipping platforms, product parsers
Business process management web applications
CRM systems, ERP systems, corporate portals, production management systems, information parsers
Electronic service websites or web applications
Classified ads platforms, online schools, online cinemas, website builders, portals for electronic services, video hosting platforms, thematic portals

These are just some of the technical types of websites we work with, and each of them can have its own specific features and functionality, as well as be customized to meet the specific needs and goals of the client.

Our competencies:
Development stages
Latest works
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822
  • image_crm_chasseurs_493_0.webp
    CRM development for Chasseurs
    847
  • image_website-sbh_0.png
    Website development for SBH Partners
    999
  • image_website-_0.png
    Website development for Red Pear
    451

Setting up custom monitoring dashboards (Grafana)

Grafana dashboards are the visual language of infrastructure state. Default community dashboards are often overloaded with unnecessary panels and don't answer specific service questions. Custom dashboards are built from questions your team asks, not available metrics.

Principles of effective dashboard

Information hierarchy. Top row—most important: is the service working or not. Details below. Don't make eyes search for status.

Actionable metrics. Each panel answers a question that impacts decision. "CPU 67%" is not actionable. "CPU 67%, target 60%, trending up, 3 instances scaling" is actionable.

Time variables. $__timeRange and $__interval allow changing view period and preserving graph resolution.

Dashboard structure for web application

Row 1: Service Health (large stat panels)
  [Error Rate %] [P95 Latency ms] [Uptime %] [Active Users]

Row 2: Traffic & Performance
  [RPS - timeseries] [Response time P50/P95/P99 - timeseries] [HTTP status breakdown]

Row 3: Infrastructure
  [CPU % per host] [Memory % per host] [Disk I/O] [Network I/O]

Row 4: Database
  [DB Connections active/max] [Query latency P95] [Slow queries count]

Row 5: Cache
  [Redis hit rate %] [Redis memory usage] [Evictions per sec]

Prometheus queries for key panels

Error Rate:

sum(rate(http_requests_total{status=~"5..", job="app"}[5m]))
/
sum(rate(http_requests_total{job="app"}[5m]))
* 100

P95 Latency:

histogram_quantile(0.95,
  sum(rate(http_request_duration_seconds_bucket{job="app"}[5m])) by (le)
)

Active DB Connections:

pg_stat_activity_count{datname="mydb", state="active"}

Redis Hit Rate:

rate(redis_keyspace_hits_total[5m])
/
(rate(redis_keyspace_hits_total[5m]) + rate(redis_keyspace_misses_total[5m]))
* 100

Dashboard variables

Variables make dashboard universal:

{
  "templating": {
    "list": [
      {
        "name": "environment",
        "type": "custom",
        "options": [
          {"text": "production", "value": "production"},
          {"text": "staging", "value": "staging"}
        ]
      },
      {
        "name": "instance",
        "type": "query",
        "query": "label_values(up{job='app', env='$environment'}, instance)"
      }
    ]
  }
}

Use in queries: {job="app", env="$environment", instance="$instance"}.

Deployment annotations

Vertical line on graphs at each deployment—quickly see correlation between deploy and degradation:

# CI/CD: send annotation after deploy
import requests

def create_grafana_annotation(grafana_url: str, api_key: str, text: str, tags: list):
    requests.post(
        f"{grafana_url}/api/annotations",
        headers={"Authorization": f"Bearer {api_key}"},
        json={
            "text": text,
            "tags": tags,
            "time": int(time.time() * 1000)  # milliseconds
        }
    )

# In CI/CD pipeline after successful deploy:
create_grafana_annotation(
    GRAFANA_URL, API_KEY,
    text=f"Deploy v{VERSION} to production",
    tags=["deploy", "production"]
)

Dashboard as Code (Grafonnet / Terraform)

Store dashboards in git, not just in UI:

// Grafonnet: dashboard as code
local grafana = import 'grafonnet/grafana.libsonnet';
local dashboard = grafana.dashboard;
local graphPanel = grafana.graphPanel;

dashboard.new(
  'Application Overview',
  time_from='now-1h',
  refresh='30s',
)
.addPanel(
  graphPanel.new(
    'Error Rate',
    datasource='Prometheus',
  )
  .addTarget(
    grafana.prometheus.target(
      'sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100',
      legendFormat='Error Rate %'
    )
  ),
  gridPos={ x: 0, y: 0, w: 12, h: 8 }
)

Or via Terraform Grafana provider: resource "grafana_dashboard" "app".

Sharing and access

  • Read-only public URL—for status board in office
  • Snapshot—share current state with someone without Grafana access
  • Embedded panels—embed in internal team portal

Creation timeline

  • Basic panels (error rate, latency, traffic) — 1-2 days
  • Full application dashboard (all layers) — 3-5 days
  • Dashboard as code + git workflow — 1-2 days
  • Deploy annotations — 1 day