CareerLesson 9 of 9

Interview Preparation

Targeted Grafana interview preparation for DevOps, SRE, and Platform roles.

Simple Explanation (ELI5)

Interviewers want to know that you can build useful dashboards and use them during real incidents, not just click around UI settings.

Core Revision Topics

Rapid-fire Questions

Beginner

What is Grafana used for?

Visualization and analysis of metrics, logs, and traces from multiple data sources.

How does Grafana connect to Prometheus?

By adding Prometheus as a datasource URL and querying metrics via PromQL.

What is a panel?

A single visualization unit inside a dashboard.

Why use variables in dashboards?

To filter by namespace/service/environment dynamically.

What are annotations?

Time markers for events like deployments to correlate with metric changes.

Intermediate

How do you design a reliable API dashboard?

Include RED metrics, infra context, and clear drilldown paths.

What causes high-cardinality query issues?

Excessive unique label combinations (e.g., path, pod UID, user IDs).

How do recording rules help Grafana?

Precompute expensive queries to speed up panels and alerts.

Difference between dashboard alerts and Prometheus alerts?

Prometheus rules are source-native; Grafana can centralize routing and multi-source alerting.

How do you validate alert quality?

Measure precision/recall of incidents, noise rate, and MTTA impact.

Scenario-based

You are asked to monitor a new microservice. First dashboard panels?

RPS, 5xx rate, p95 latency, CPU, memory, restart count.

Dashboard is overloaded with 40 panels. What do you do?

Split into summary + deep-dive dashboards by audience/use case.

No alerts fired during outage. How investigate?

Check rule conditions, evaluation interval, notification policies, and silences.

How demonstrate impact of your observability work?

Show reduced MTTR, faster triage, and fewer false-positive alerts.

What tradeoff in short vs long query windows?

Short windows are responsive but noisy; long windows are stable but slower to detect spikes.

Mock Practical Round

  1. Connect Prometheus datasource and verify health.
  2. Create dashboard with CPU, memory, request panels.
  3. Add one latency alert and one error-rate alert.
  4. Simulate issue and explain investigation path in 5 minutes.

Summary

Strong interview performance comes from demonstrating practical observability thinking, not just Grafana UI familiarity.