AdvancedLesson 8 of 11

Integration (Kubernetes, Apps)

Integrate Prometheus with Kubernetes and application instrumentation. This is the most important production integration path for modern teams.

Simple Explanation (ELI5)

Prometheus becomes powerful when it can automatically discover Kubernetes workloads and scrape app metrics without you manually updating IP addresses every day.

Real-world Analogy

Kubernetes is like a busy airport where gates, crews, and planes change constantly. Prometheus needs a live flight board, not a printed list from yesterday. That flight board is service discovery and the Prometheus Operator.

Technical Explanation

In Kubernetes, Prometheus is commonly deployed with the Prometheus Operator. The operator introduces custom resources like ServiceMonitor, PodMonitor, and PrometheusRule. These resources define what to scrape and how. Core integrations usually include node-exporter, kube-state-metrics, cAdvisor-derived container metrics, kubelet metrics, and application metrics exposed by pods or services.

IntegrationWhat It Gives YouWhy It Matters
node-exporterNode CPU, memory, filesystem, loadHost-level health and saturation
kube-state-metricsObject state for pods, deployments, jobsKubernetes desired vs actual state
cAdvisor / kubeletContainer CPU, memory, filesystemPod and container runtime behavior
ServiceMonitorScrape services matching labelsOperator-native service scraping
PodMonitorScrape pods directlyUseful for sidecars or headless workloads

Visual Representation

Kubernetes Objects
Pods / Services / Nodes
Operator CRDs
ServiceMonitor / PodMonitor
Prometheus Scrapes
Grafana / Alerts

Commands / Syntax

yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: checkout-api
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: checkout-api
  namespaceSelector:
    matchNames:
      - prod
  endpoints:
    - port: metrics
      path: /metrics
      interval: 15s
bash
# Install kube-prometheus-stack with Helm
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install monitoring prometheus-community/kube-prometheus-stack -n monitoring --create-namespace

# Check core components
kubectl get pods -n monitoring
kubectl get servicemonitors -A
kubectl get podmonitors -A
kubectl port-forward svc/monitoring-kube-prometheus-prometheus -n monitoring 9090:9090
python
from prometheus_client import Counter, Histogram, start_http_server

requests_total = Counter("checkout_requests_total", "Total checkout requests", ["status"])
request_latency = Histogram("checkout_request_duration_seconds", "Checkout latency")

start_http_server(8000)

# In app code:
requests_total.labels(status="200").inc()
with request_latency.time():
    process_checkout()

Example (Real-world Use Case)

A retail platform runs kube-prometheus-stack in the monitoring namespace. ServiceMonitor resources discover API services by label. kube-state-metrics exposes deployment readiness, node exporter exposes host pressure, and app instrumentation exposes business endpoints like checkout and payment metrics.

Hands-on Section

  1. Install kube-prometheus-stack in a test cluster.
  2. Create a service with a metrics named port.
  3. Add a matching ServiceMonitor in the monitoring namespace.
  4. Confirm the target appears in Prometheus and query up{job=~".*checkout.*"}.

Try It Yourself

Debugging Scenarios

Kubernetes Integration Failure

If Prometheus in Kubernetes is not collecting app metrics, the most common causes are mismatched labels, wrong namespace selectors, missing named ports, RBAC gaps, or network policies.

Interview Questions

Beginner

What is the Prometheus Operator?

The Prometheus Operator manages Prometheus-related components in Kubernetes using custom resources like ServiceMonitor and PrometheusRule.

What is ServiceMonitor?

A ServiceMonitor tells the Prometheus Operator how to scrape services matching certain labels.

What is kube-state-metrics?

It exposes Kubernetes object state such as deployment readiness, pod status, and job completion information.

Why is Kubernetes integration important for Prometheus?

Because Kubernetes environments are dynamic, and Prometheus needs native discovery and metadata to monitor them reliably.

How do applications expose Prometheus metrics?

By instrumenting code with a Prometheus client library and exposing a /metrics endpoint.

Intermediate

When would you use PodMonitor instead of ServiceMonitor?

When you need to scrape pods directly, especially if there is no stable service or you want pod-level selection behavior.

Why do named service ports matter in ServiceMonitor configs?

Because ServiceMonitor endpoints often reference the service port by name, and mismatches prevent Prometheus from scraping the right endpoint.

What metrics would you use for Kubernetes CPU and memory pressure?

Node exporter and kubelet/cAdvisor metrics for utilization, plus kube-state-metrics for scheduling and eviction-related context.

Why instrument apps if Kubernetes already exposes lots of metrics?

Kubernetes metrics explain platform state, but app instrumentation explains user-facing behavior like request rate, error rate, and business latency.

What is a common anti-pattern in Kubernetes monitoring?

Relying only on infrastructure metrics and ignoring app-level metrics, which leaves teams blind to real user impact.

Scenario-based

ServiceMonitor is applied, but the app target never appears. What do you inspect first?

I inspect service labels, ServiceMonitor selector labels, namespace selectors, and whether the Prometheus instance is configured to watch that namespace.

Your API has CPU spikes in Kubernetes. Which Prometheus sources help you diagnose it?

I use container CPU metrics, node saturation metrics, pod restart data, and app latency or queue metrics to see whether the spike is workload, node, or traffic driven.

A cluster upgrade breaks kube-state-metrics dashboards. What could have changed?

Metric names, labels, API versions, or the kube-state-metrics deployment itself may have changed. I verify compatibility and scrape status first.

A developer asks to label every request metric with tenant ID. In a multi-tenant Kubernetes platform, what do you say?

I reject unbounded tenant IDs unless the set is tightly controlled. Otherwise it creates dangerous cardinality and hurts Prometheus stability.

Metrics are exposed on port 8000, but the Service port is named http instead of metrics. Why might scraping fail?

If the ServiceMonitor endpoint references metrics by name, Prometheus cannot resolve the port correctly. The names must match.

Summary

Kubernetes integration is where Prometheus becomes operationally decisive. The combination of Operator resources, node and cluster exporters, and application instrumentation gives teams both platform-level and user-level visibility.