IntermediateLesson 5 of 9

Panels and Queries

Choose correct panel types and write PromQL for CPU, memory, and request monitoring.

Simple Explanation (ELI5)

Panels are the widgets on a dashboard; queries decide what data each widget shows.

Technical Explanation

Use time series for trends, stat for current values, table for breakdowns, and heatmap for latency distributions. Query quality matters more than panel cosmetics.

Visual Section

CPU Trend

Time series

Current Memory

Stat panel

Top Endpoints

Table/bar gauge

Hands-on Commands

promql
# CPU usage by pod
sum by (pod) (rate(container_cpu_usage_seconds_total{container!=""}[5m]))

# Memory by pod
sum by (pod) (container_memory_working_set_bytes{container!=""})

# Requests per second
sum(rate(http_requests_total[5m]))

# 5xx error rate
sum(rate(http_requests_total{status=~"5.."}[5m]))

Debugging Scenarios

Real-world Use Case

A service dashboard with CPU, memory, request rate, and 5xx panels quickly isolated a noisy canary release.

Interview Questions

Beginner

What panel for trends?

Time series panel.

Why use rate() for counters?

To convert cumulative counts into per-second rates.

What panel for current value?

Stat panel.

How show top offenders?

Table/bar chart with sort and topk.

What causes no-data panel?

Wrong query, datasource, labels, or time window.

Intermediate

How avoid high-cardinality query cost?

Aggregate early and avoid unbounded label dimensions.

When use heatmap?

For bucketed distributions like request latency.

How design request panels?

Traffic, error rate, and latency together for RED method.

Why unit settings critical?

Prevents misreading bytes as MB, seconds as ms, etc.

How compare environments in one dashboard?

Use environment variable and label-based filtering.

Scenario-based

Panel shows CPU 0% but pods are hot. Why?

Likely wrong metric source or label mismatch excluding pods.

Error panel flat but users seeing failures. What next?

Check status label mapping and whether errors are recorded in metric.

Memory panel spikes every deploy. Is this bad?

Not always; correlate with restarts and sustained growth.

Dashboard slow after adding topk panel. Cause?

Expensive query across many label combinations.

How debug wrong namespace in panel?

Inspect variable value and panel query namespace label filter.

Summary

Good Grafana outcomes depend on matching panel type to metric behavior and writing efficient PromQL.