SLI, SLO, SLA and Error Budgets
ELI5 Explanation
SLI is the score, SLO is the target score, SLA is the promise to customers, and error budget is how much failure you can afford before slowing releases.
Technical Explanation
An SLI measures service behavior such as request success rate or p95 latency. An SLO sets the target, for example 99.9% success in 30 days. SLA is contractual and usually lower than SLO. Error budget equals 1 minus SLO and acts as a release control signal.
Visual
Hands-on Commands
# Availability SLI
sum(rate(http_requests_total{status!~"5.."}[5m]))
/
sum(rate(http_requests_total[5m]))
# p95 latency SLI
histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))Debugging Scenario
Team sees 99.95% monthly uptime and assumes all good, but p95 latency doubles at business peak. Availability-only SLO hides user pain. Add latency SLI and burn-rate alerts to catch this earlier.
Beginner
- What is an SLI?
- Difference between SLO and SLA?
- What is an error budget?
- Why should SLO be stricter than SLA?
- Give two common SLIs.
Intermediate
- How do you choose valid user-centric SLIs?
- How do burn-rate alerts work?
- What are bad SLO anti-patterns?
- How do you review SLOs each quarter?
- How can different tiers have different SLOs?
Scenario-based
- Budget exhausted mid-month. What release policy do you apply?
- Business needs faster features despite high burn. How do you negotiate?
- External dependency failures consume your budget. What next?
- Global users see poor latency but regional SLI looks healthy. Why?
- Service has low traffic. How do you avoid noisy SLI math?
Real-world Use Case
An e-commerce API defined separate checkout and catalog SLOs. Checkout used stricter reliability and release gates; catalog accepted lower targets for faster experimentation. Revenue-critical path stayed stable during sales events.
Summary
SLI, SLO, SLA, and error budgets convert reliability into actionable engineering decisions. Next, you will design monitoring and alerting strategy to measure these goals continuously.