Data Collection (Scraping)
Learn scrape jobs, exporters, target discovery, scrape intervals, relabeling, and how Prometheus actually collects metrics in production.
Simple Explanation (ELI5)
Scraping means Prometheus regularly visits each target and reads its metric page. If the target responds, Prometheus saves the numbers. If it does not, Prometheus marks the target as down.
Real-world Analogy
A janitor checks every room in a building on a fixed schedule. If a room is locked, the janitor notes that. Scraping works the same way: Prometheus follows a route, checks each endpoint, and records whether it could collect data.
Technical Explanation
Prometheus uses scrape_configs to define jobs. Each job can include static targets or dynamic service discovery. Exporters expose metrics for systems like Linux, Redis, MySQL, or black-box HTTP probing. Relabeling transforms target labels before scraping or storage.
| Concept | Purpose | Example |
|---|---|---|
| scrape_interval | How often to collect | 15s for apps, 60s for low-change systems |
| scrape_timeout | How long to wait | 10s timeout on slow targets |
| job_name | Groups targets logically | node-exporter, kubernetes-pods |
| relabel_configs | Rewrite or keep/drop labels | Keep only annotated pods |
| metric_relabel_configs | Filter metrics after scrape | Drop high-cardinality labels |
Visual Representation
Intervals / Paths / Labels
Commands / Syntax
scrape_configs:
- job_name: "node-exporter"
scrape_interval: 15s
static_configs:
- targets: ["node1:9100", "node2:9100"]
- job_name: "kubernetes-pods"
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true# Check discovered targets curl http://localhost:9090/api/v1/targets curl http://localhost:9090/api/v1/targets/metadata # Inspect a metrics endpoint directly curl http://node1:9100/metrics | head # Kubernetes service and pod checks kubectl get pods -A -o wide kubectl get svc -A kubectl describe pod my-app-123 -n prod
Example (Real-world Use Case)
A cluster uses node exporter on each node, kube-state-metrics for object state, cAdvisor-derived container metrics, and custom application metrics from services annotated for Prometheus scraping. Scrape intervals are shorter for API latency metrics and longer for less dynamic batch systems.
Hands-on Section
- Add a scrape job for a test app exposing
/metrics. - Verify the target appears under
Status → Targets. - Break the port intentionally and confirm the target turns
DOWN. - Restore the port and observe recovery.
Try It Yourself
- Write a scrape config for one static Linux host running node exporter.
- Name two reasons to change
scrape_interval. - Explain why a pod annotation-based scrape approach can be useful in Kubernetes.
Debugging Scenarios
If metrics are missing, start by checking target discovery and endpoint accessibility. PromQL is usually not the first problem in a scrape failure.
- Target down: port, path, network policy, or service DNS issue.
- Target missing entirely: bad labels, bad annotations, ServiceMonitor mismatch, or namespace selector issue.
- Metrics incomplete: metric relabeling accidentally dropped series or exporter permissions are limited.
Interview Questions
Beginner
Scraping is when Prometheus periodically requests metrics from a target endpoint over HTTP.
A scrape job is a logical configuration block that defines how Prometheus collects metrics from one set of targets.
An exporter is a process that exposes metrics in Prometheus format for a system that does not do so natively.
In the Prometheus UI under Status → Targets or via the up metric.
Because in dynamic environments like Kubernetes, targets constantly change and cannot be managed with static IP lists alone.
Intermediate
Relabeling changes target metadata before scraping. Metric relabeling changes or drops metrics after scrape but before storage.
Timeout must be shorter than the interval. If timeout is too close to interval, slow targets can cause scrape instability.
Short intervals increase ingestion cost, network load, and storage. Use higher frequency only where fast detection matters.
Through Kubernetes service discovery or Prometheus Operator resources like ServiceMonitor and PodMonitor.
To probe endpoints externally for reachability, latency, DNS resolution, or HTTP success instead of relying only on internal app metrics.
Scenario-based
I inspect network path, service or pod label matching, scrape path, port name, ServiceMonitor selectors, and namespace scoping.
The new node may not have node exporter running, or discovery labels changed. In Kubernetes, daemonset health and node scheduling are common causes.
Metric relabel configs may be dropping them, or the exporter may disable some collectors by default.
Multiple overlapping scrape configs or both annotation-based scraping and ServiceMonitor-based scraping targeting the same endpoint.
Prometheus may fail to scrape app pods depending on topology and network policies, so targets go down even if apps themselves are healthy.
Summary
Scraping is where Prometheus earns or loses trust. Good scrape configs, correct discovery, and sane relabeling produce reliable monitoring. Bad target selection or path configuration produces silence, which is worse than noise.