Interview Preparation
Targeted Grafana interview preparation for DevOps, SRE, and Platform roles.
Simple Explanation (ELI5)
Interviewers want to know that you can build useful dashboards and use them during real incidents, not just click around UI settings.
Core Revision Topics
- Grafana architecture and datasource model
- Prometheus integration and PromQL basics
- Dashboard design for CPU, memory, requests
- Alerting and notification routing
- Troubleshooting no-data and noisy-alert cases
Rapid-fire Questions
Beginner
Visualization and analysis of metrics, logs, and traces from multiple data sources.
By adding Prometheus as a datasource URL and querying metrics via PromQL.
A single visualization unit inside a dashboard.
To filter by namespace/service/environment dynamically.
Time markers for events like deployments to correlate with metric changes.
Intermediate
Include RED metrics, infra context, and clear drilldown paths.
Excessive unique label combinations (e.g., path, pod UID, user IDs).
Precompute expensive queries to speed up panels and alerts.
Prometheus rules are source-native; Grafana can centralize routing and multi-source alerting.
Measure precision/recall of incidents, noise rate, and MTTA impact.
Scenario-based
RPS, 5xx rate, p95 latency, CPU, memory, restart count.
Split into summary + deep-dive dashboards by audience/use case.
Check rule conditions, evaluation interval, notification policies, and silences.
Show reduced MTTR, faster triage, and fewer false-positive alerts.
Short windows are responsive but noisy; long windows are stable but slower to detect spikes.
Mock Practical Round
- Connect Prometheus datasource and verify health.
- Create dashboard with CPU, memory, request panels.
- Add one latency alert and one error-rate alert.
- Simulate issue and explain investigation path in 5 minutes.
Summary
Strong interview performance comes from demonstrating practical observability thinking, not just Grafana UI familiarity.