IntermediateLesson 4 of 9

Dashboards

Design maintainable Grafana dashboards that support fast diagnosis and team collaboration.

Simple Explanation (ELI5)

A dashboard is a page of charts that tells the story of system health at a glance.

Technical Explanation

Good dashboard design follows hierarchy: service overview at top, saturation and dependency panels below, then component details. Use variables for environment, namespace, and service. Keep panel units and thresholds explicit.

Visual Section

Top Row

SLI summary panels

Middle

Latency, errors, traffic

Bottom

CPU, memory, pod details

Hands-on Commands

json

{
  "title": "Checkout Service",
  "tags": ["prod", "api"],
  "timezone": "browser",
  "refresh": "30s"
}

Debugging Scenarios

Panels show mixed units: set explicit units per panel.
Dashboard unreadable on-call: reduce panels and prioritize SLIs.
Wrong environment shown: variable defaults misconfigured.

Real-world Use Case

A service dashboard with request rate, error rate, and p95 latency reduced incident triage time from 20 minutes to 5 minutes.

Interview Questions

Beginner

What makes a good dashboard?▾

Clear layout, useful metrics, consistent units, and fast insight.

Why use variables?▾

To reuse one dashboard across environments/services.

What should be top row?▾

Service health summary metrics.

How often refresh dashboard?▾

Based on need; 15-60s for ops is common.

Why avoid too many panels?▾

They dilute signal and slow diagnosis.

Intermediate

How structure dashboards by audience?▾

Exec summary, service owner detail, and on-call drilldown dashboards.

How reduce dashboard query load?▾

Use recording rules, fewer panels, and sane refresh intervals.

What are dashboard annotations for?▾

Mark deploys/incidents to correlate metric changes with events.

How ensure consistency across teams?▾

Use templates, folders, naming standards, and reviews.

When split into multiple dashboards?▾

When one page becomes overloaded or crosses ownership boundaries.

Scenario-based

Dashboard takes 25s to load during incidents. Fix?▾

Cut expensive panels, shorten range defaults, add recording rules.

A panel shows wrong namespace data. Why?▾

Variable or label filter mismatch.

How keep dev and prod data separate?▾

Datasource separation and strict environment variables.

Users ignore dashboard. What changed?▾

Focus on high-signal metrics and remove clutter.

How prove dashboard quality improved?▾

Track MTTD/MTTR and on-call feedback.

Summary

Dashboards are operational products. Design them for fast decisions, not visual decoration.

PreviousData Sources (Prometheus)← Back to Course NextPanels and Queries