Hands-onLesson 11 of 11

Interview Preparation

Consolidate everything into interview-ready answers covering monitoring theory, Prometheus internals, PromQL, Alertmanager, and Kubernetes integration.

Simple Explanation (ELI5)

This lesson is your rehearsal room. You already learned the concepts. Now you practice answering clearly, with the right level of detail, and with production credibility.

Real-world Analogy

Learning Prometheus is training. Interview preparation is game day. You are proving you can use the tools under pressure and explain trade-offs to other engineers.

Technical Explanation

Strong Prometheus interview answers usually combine three layers: concept, implementation detail, and production trade-off. For example: “Prometheus uses pull-based scraping, which helps in Kubernetes because targets are dynamic; I would expose app metrics via ServiceMonitor, keep label cardinality under control, and alert on user-facing error rate rather than raw CPU.”

Visual Representation

Concept

What the thing is

Implementation

How you configure or query it

Trade-off

Why you choose one pattern over another

Commands / Syntax

promql
# Interview-friendly examples
rate(http_requests_total[5m])
sum by (namespace) (rate(container_cpu_usage_seconds_total{container!=""}[5m]))
increase(kube_pod_container_status_restarts_total[15m])

# Useful operational endpoints
curl http://localhost:9090/api/v1/targets
curl http://localhost:9090/api/v1/rules

Example (Real-world Use Case)

When asked how to monitor a Kubernetes API service, a strong answer would include node-level metrics, pod health, request rate, error rate, latency histograms, alerting through Alertmanager, and ServiceMonitor-based discovery. It would also mention avoiding high-cardinality labels like request IDs.

Hands-on Section

  1. Practice a 60-second explanation of monitoring vs observability.
  2. Practice a 90-second explanation of Prometheus architecture.
  3. Write one PromQL query each for CPU, memory, and error rate.
  4. Practice explaining a troubleshooting workflow for missing metrics.

Try It Yourself

Interview Questions

Beginner

What is the difference between monitoring and observability?

Monitoring tracks known health signals and known-failure conditions. Observability uses telemetry to investigate and explain unknown issues in complex systems.

Why is Prometheus popular?

It is open source, cloud-native, works well with Kubernetes, has strong service discovery, a powerful query language, and a large ecosystem.

What are the main metric types in Prometheus?

Counter, gauge, histogram, and summary, though counters, gauges, and histograms are most commonly discussed and used.

What is an exporter?

An exporter exposes metrics in Prometheus format for a target system such as Linux, MySQL, Redis, or a black-box endpoint.

What is Alertmanager?

Alertmanager manages alert notifications by grouping, routing, deduplicating, and silencing alerts produced by Prometheus.

Intermediate

Why is pull-based scraping helpful in Kubernetes?

It works well with dynamic targets, allows Prometheus to know target health directly, and integrates naturally with Kubernetes discovery and operator resources.

What is high cardinality and why is it bad?

High cardinality means too many unique time series due to labels. It increases memory, disk, and query cost and can destabilize Prometheus.

How would you query error rate in PromQL?

I would divide the 5xx request rate by total request rate over a time window and multiply by 100.

Why are histograms useful for latency?

They preserve bucketed latency distributions so you can calculate percentiles and threshold-based latency SLOs more meaningfully.

What resources does the Prometheus Operator add to Kubernetes?

Common CRDs include ServiceMonitor, PodMonitor, Prometheus, Alertmanager, and PrometheusRule.

Scenario-based

How would you monitor a Kubernetes API service end to end?

I would collect request rate, errors, latency histograms, pod CPU and memory, restart counts, node saturation, and deployment health using app instrumentation plus kube-state-metrics and node exporter. I would discover the service using ServiceMonitor and alert on user-impact signals.

Metrics stopped after a deployment. What is your troubleshooting order?

I check target discovery, target reachability, the raw metrics endpoint, metric presence in Prometheus, then PromQL and dashboards. In Kubernetes I also inspect ServiceMonitor selectors and port naming.

A team wants to add request_id as a label for all request metrics. What do you recommend?

I recommend against it because request IDs are unbounded and cause cardinality explosions. Use logs or traces for request-level debugging instead.

How do you design actionable alerts for memory issues?

I combine memory pressure with restart or OOM signals, add a for window, and ensure the alert maps to real workload risk rather than harmless cache behavior.

Why might you choose Prometheus plus Grafana over a monolithic monitoring platform?

It offers flexible open tooling, strong Kubernetes support, customizable dashboards, and a large ecosystem, though it also requires more operational ownership.

Mock Interview Drill

PromptWhat a Strong Answer Includes
Explain Prometheus architectureScrape targets, TSDB, rules, Alertmanager, discovery, exporters
How do you monitor Kubernetes?Operator, ServiceMonitor, node exporter, kube-state-metrics, app metrics
How do you detect a CPU spike?PromQL rate, pod/workload aggregation, correlation with latency and errors
How do you debug missing metrics?Discovery, connectivity, endpoint, raw metric, query, dashboard
How do you avoid alert fatigue?Actionable rules, for windows, grouping, severity routing, symptom-based alerting

Summary

Interview strength comes from clarity and production judgment. If you can explain Prometheus architecture, query core operational signals, troubleshoot missing metrics, and describe Kubernetes integration with trade-offs, you are in good shape for real DevOps and SRE interviews.

← Back to CourseCompletePrometheus Track Finished