Integration (Kubernetes, Apps)
Integrate Prometheus with Kubernetes and application instrumentation. This is the most important production integration path for modern teams.
Simple Explanation (ELI5)
Prometheus becomes powerful when it can automatically discover Kubernetes workloads and scrape app metrics without you manually updating IP addresses every day.
Real-world Analogy
Kubernetes is like a busy airport where gates, crews, and planes change constantly. Prometheus needs a live flight board, not a printed list from yesterday. That flight board is service discovery and the Prometheus Operator.
Technical Explanation
In Kubernetes, Prometheus is commonly deployed with the Prometheus Operator. The operator introduces custom resources like ServiceMonitor, PodMonitor, and PrometheusRule. These resources define what to scrape and how. Core integrations usually include node-exporter, kube-state-metrics, cAdvisor-derived container metrics, kubelet metrics, and application metrics exposed by pods or services.
| Integration | What It Gives You | Why It Matters |
|---|---|---|
| node-exporter | Node CPU, memory, filesystem, load | Host-level health and saturation |
| kube-state-metrics | Object state for pods, deployments, jobs | Kubernetes desired vs actual state |
| cAdvisor / kubelet | Container CPU, memory, filesystem | Pod and container runtime behavior |
| ServiceMonitor | Scrape services matching labels | Operator-native service scraping |
| PodMonitor | Scrape pods directly | Useful for sidecars or headless workloads |
Visual Representation
Pods / Services / Nodes
ServiceMonitor / PodMonitor
Grafana / Alerts
Commands / Syntax
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: checkout-api
namespace: monitoring
spec:
selector:
matchLabels:
app: checkout-api
namespaceSelector:
matchNames:
- prod
endpoints:
- port: metrics
path: /metrics
interval: 15s# Install kube-prometheus-stack with Helm helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update helm install monitoring prometheus-community/kube-prometheus-stack -n monitoring --create-namespace # Check core components kubectl get pods -n monitoring kubectl get servicemonitors -A kubectl get podmonitors -A kubectl port-forward svc/monitoring-kube-prometheus-prometheus -n monitoring 9090:9090
from prometheus_client import Counter, Histogram, start_http_server
requests_total = Counter("checkout_requests_total", "Total checkout requests", ["status"])
request_latency = Histogram("checkout_request_duration_seconds", "Checkout latency")
start_http_server(8000)
# In app code:
requests_total.labels(status="200").inc()
with request_latency.time():
process_checkout()Example (Real-world Use Case)
A retail platform runs kube-prometheus-stack in the monitoring namespace. ServiceMonitor resources discover API services by label. kube-state-metrics exposes deployment readiness, node exporter exposes host pressure, and app instrumentation exposes business endpoints like checkout and payment metrics.
Hands-on Section
- Install
kube-prometheus-stackin a test cluster. - Create a service with a
metricsnamed port. - Add a matching ServiceMonitor in the monitoring namespace.
- Confirm the target appears in Prometheus and query
up{job=~".*checkout.*"}.
Try It Yourself
- Explain when you would choose PodMonitor over ServiceMonitor.
- List the minimum metrics you want from a Kubernetes node and a pod.
- Instrument one demo app endpoint with a counter and histogram.
Debugging Scenarios
If Prometheus in Kubernetes is not collecting app metrics, the most common causes are mismatched labels, wrong namespace selectors, missing named ports, RBAC gaps, or network policies.
- If ServiceMonitor exists but target is absent, compare selector labels to the Service labels exactly.
- If target exists but is down, verify path, scheme, port name, and pod connectivity.
- If kube-state-metrics is missing, check that the deployment is running and being scraped by the operator-managed Prometheus instance.
Interview Questions
Beginner
The Prometheus Operator manages Prometheus-related components in Kubernetes using custom resources like ServiceMonitor and PrometheusRule.
A ServiceMonitor tells the Prometheus Operator how to scrape services matching certain labels.
It exposes Kubernetes object state such as deployment readiness, pod status, and job completion information.
Because Kubernetes environments are dynamic, and Prometheus needs native discovery and metadata to monitor them reliably.
By instrumenting code with a Prometheus client library and exposing a /metrics endpoint.
Intermediate
When you need to scrape pods directly, especially if there is no stable service or you want pod-level selection behavior.
Because ServiceMonitor endpoints often reference the service port by name, and mismatches prevent Prometheus from scraping the right endpoint.
Node exporter and kubelet/cAdvisor metrics for utilization, plus kube-state-metrics for scheduling and eviction-related context.
Kubernetes metrics explain platform state, but app instrumentation explains user-facing behavior like request rate, error rate, and business latency.
Relying only on infrastructure metrics and ignoring app-level metrics, which leaves teams blind to real user impact.
Scenario-based
I inspect service labels, ServiceMonitor selector labels, namespace selectors, and whether the Prometheus instance is configured to watch that namespace.
I use container CPU metrics, node saturation metrics, pod restart data, and app latency or queue metrics to see whether the spike is workload, node, or traffic driven.
Metric names, labels, API versions, or the kube-state-metrics deployment itself may have changed. I verify compatibility and scrape status first.
I reject unbounded tenant IDs unless the set is tightly controlled. Otherwise it creates dangerous cardinality and hurts Prometheus stability.
http instead of metrics. Why might scraping fail?If the ServiceMonitor endpoint references metrics by name, Prometheus cannot resolve the port correctly. The names must match.
Summary
Kubernetes integration is where Prometheus becomes operationally decisive. The combination of Operator resources, node and cluster exporters, and application instrumentation gives teams both platform-level and user-level visibility.