High Availability and Scaling
Design resilient, globally distributed systems that scale automatically to handle traffic spikes.
Simple Explanation (ELI5)
High availability means your app stays online even if parts fail. Scaling means handling more users/requests automatically. In GCP, you deploy apps across multiple zones (redundancy), use auto-scaling groups (add/remove instances based on demand), and global load balancers (route traffic worldwide). If one zone goes down, traffic shifts to another. If traffic spikes, new instances spin up automatically.
Why High Availability and Scaling Matter
- Uptime: 99.99% availability requires zero-downtime deployments and automatic failover.
- Cost Efficiency: Scale down during low-traffic periods; scale up during peaks. Pay only for what you use.
- User Experience: Fast response times (low latency) when serving globally.
- Business Resilience: Survive zone/region failures without customer impact.
Technical Explanation
1. Multi-Zone Deployment
Deploy identical infrastructure across 2+ zones in a region. If one zone fails, traffic redirects to others. Health checks (periodic pings) detect unhealthy instances and remove them automatically.
# Create an instance template (blueprint) gcloud compute instance-templates create my-template \ --machine-type=n1-standard-1 \ --image-family=debian-11 \ --image-project=debian-cloud \ --boot-disk-size=20GB \ --metadata=startup-script='#!/bin/bash apt-get update apt-get install -y nginx echo "Server: $(hostname -I)" | sudo tee /var/www/html/index.html' # Create a managed instance group (spans multiple zones) gcloud compute instance-groups managed create my-ig \ --base-instance-name=instance \ --template=my-template \ --size=3 \ --zone=us-central1-a # Create a health check gcloud compute health-checks create http my-health-check \ --global \ --port=80 \ --request-path=/ # Attach health check to instance group gcloud compute instance-groups managed set-health-checks my-ig \ --health-checks=my-health-check \ --global
2. Auto-Scaling
Automatically add or remove instances based on CPU, memory, or custom metrics. Keeps costs low and performance high.
# Set up auto-scaling policy gcloud compute instance-groups managed set-autoscaling my-ig \ --max-num-replicas=10 \ --min-num-replicas=2 \ --target-cpu-utilization=0.7 \ --zone=us-central1-a # Scale policy: min 2 instances, max 10, scale up when CPU > 70% # Scales down when CPU < 70% (cooldown period: 300 seconds)
3. Load Balancing
Distribute traffic across instances. GCP offers HTTP(S) Load Balancer (Layer 7, content-aware), Network Load Balancer (Layer 4, ultra-high throughput), and Internal Load Balancer (private traffic within VPC).
# Create a backend service (target for load balancer) gcloud compute backend-services create my-backend \ --global \ --protocol=HTTP \ --health-checks=my-health-check \ --port-name=http # Add instance group to backend service gcloud compute backend-services add-backend my-backend \ --global \ --instance-group=my-ig \ --instance-group-zone=us-central1-a # Create a URL map (routes requests to backend) gcloud compute url-maps create my-lb \ --default-service=my-backend # Create an HTTP(S) proxy gcloud compute target-http-proxies create my-proxy \ --url-map=my-lb # Create a forwarding rule (external IP, listens on port 80) gcloud compute forwarding-rules create my-forwarding-rule \ --global \ --target-http-proxy=my-proxy \ --address-region=global \ --ports=80
4. Global Traffic Director
Route users to the geographically closest region with available capacity. Combines global load balancing, multi-region deployments, and automatic failover.
# Deploy app in multiple regions (each with instance group + backend service) # Region 1: us-central1 gcloud compute instance-groups managed create my-ig-us-central1 \ --size=3 --zone=us-central1-a --template=my-template # Region 2: europe-west1 gcloud compute instance-groups managed create my-ig-europe-west1 \ --size=3 --zone=europe-west1-b --template=my-template # Create global backend service gcloud compute backend-services create my-global-backend \ --global --protocol=HTTP --load-balancing-scheme=EXTERNAL \ --global-address-to-service-endpoint \ --enable-cdn # Add both regions to global backend gcloud compute backend-services add-backend my-global-backend --global \ --instance-group=my-ig-us-central1 --instance-group-zone=us-central1-a gcloud compute backend-services add-backend my-global-backend --global \ --instance-group=my-ig-europe-west1 --instance-group-zone=europe-west1-b
5. High Availability Patterns
| Pattern | Description | Downtime | Complexity |
|---|---|---|---|
| Single Instance | One VM. Simple, cheap. | Any failure = downtime | Low |
| Multi-Zone (Same Region) | Multiple instances across zones in one region, LB across zones. | Zone failure = seconds (failover) | Medium |
| Multi-Region | Full deployment in multiple regions, global LB, DNS failover. | Region failure = seconds (drift to next closest region) | High |
| Active-Active | All regions serving traffic, global LB distributes based on proximity. | Minimal; single instance failure = milliseconds | Very High |
6. Health Checks & Monitoring
Health checks ensure only healthy instances receive traffic. Custom monitoring (Stackdriver) alerts on anomalies.
# Create a TCP health check gcloud compute health-checks create tcp my-tcp-check \ --port=3306 # Monitor load balancer traffic gcloud compute backend-services get-health my-backend --global # Create alerting policy (if CPU > 80% for 5 minutes) gcloud alpha monitoring policies create \ --notification-channels=CHANNEL_ID \ --display-name="High CPU Alert" \ --condition-display-name="CPU > 80%" \ --condition-threshold-value=0.8 \ --condition-threshold-duration=300s
GCP vs AWS vs Azure HA/Scaling
| Aspect | GCP | AWS | Azure |
|---|---|---|---|
| Regional Multi-Zone | Managed Instance Groups + HTTP(S) LB | Auto Scaling Groups + ALB/NLB | Virtual Machine Scale Sets + Load Balancer |
| Global Distribution | Cloud CDN + Traffic Director | CloudFront + ALB cross-region | Front Door + Traffic Manager |
| Auto-Scaling Metrics | CPU, memory, custom (Stackdriver) | CPU, memory, custom (CloudWatch) | CPU, memory, custom (Monitor) |
| Ease of Setup | Simple CLI; managed service feels easier | More configuration options; steeper learning curve | Similar to Azure VMs; VMSS can be complex |
| Cost (multi-region HA) | Moderate; CDN included | Higher with ALB + cross-region data transfer | Higher with Traffic Manager + geo-redundant storage |
Interview Questions
Beginner
A system that continues to operate even if parts fail. Measured as uptime percentage (99.9% = 8.7 hours downtime/year, 99.99% = 52 minutes downtime/year). Achieved via redundancy and automatic failover.
Vertical scaling: make instances bigger (more CPU, RAM). Horizontal scaling: add more instances. Horizontal is better for HA; if one instance fails, others handle traffic.
A service that distributes traffic across multiple backend instances. Sends requests only to healthy instances. If an instance becomes unhealthy (fails health checks), the LB stops sending traffic to it.
Health checks ping instances periodically (e.g., every 5 seconds). If an instance doesn't respond, it's marked unhealthy and removed from the load balancer. Ensures traffic goes only to working instances.
Intermediate
A managed instance group (MIG) uses a template to automatically create identical instances. If an instance fails, MIG recreates it. MIG also enables auto-scaling and rolling updates. WAY better than managing five VMs manually.
You define a scaling policy (min/max replicas, target CPU utilization). MIG constantly monitors metrics. If CPU > target, it adds instances. If CPU < target, it removes them (after cooldown period). Keeps performance high and costs low.
HTTP(S) LB (Layer 7) understands HTTP requests; can route by URL path or hostname. Network LB (Layer 4) routes TCP/UDP packets ultra-fast; used for gaming, IoT. HTTP(S) is most common; Network LB for extreme throughput.
Use rolling updates. Update the instance template, then apply a rolling update policy (e.g., max 1 instance at a time). MIG updates instances one by one; load balancer shifts traffic to healthy instances. Users never see downtime.
Scenario-based
Use auto-scaling MIG with target CPU utilization (70%). Set max replicas to 20. On Sunday, CPU climbs, auto-scaling adds instances automatically. On Monday morning, traffic drops, instances are removed. You pay more on Sunday, but much less than running 20 instances all week.
If you deployed MIG across zones, instances in other zones continue serving. Load balancer routes traffic away from the broken zone. Failover is automatic (health checks detect failures). Users may see a tiny blip; no downtime.
Use Cloud Loadbalancer with traffic splitting. Set 5% of traffic to new instance template, 95% to old. Monitor error rates. If new version is good, increase to 50%, then 100%. If bugs found, traffic goes back to 0% new version. Zero-downtime canary deployment.
Deploy the same app in asia-southeast1 (Singapore). Set up global load balancer. Use Traffic Director to route Singapore users to asia-southeast1, US users to us-central1. Enable Cloud CDN to cache static assets. Latency drops from 150ms to 10ms.
Real-world Scenarios
Scenario 1: E-commerce Black Friday
Your online store normally runs 10 instances (100% CPU utilization). On Black Friday, traffic 10x; you need 100 instances to keep response time fast. Solution: Start with auto-scaling MIG (max 100 replicas). 2 days before Black Friday, monitor auto-scaling metrics. As traffic spikes, MIG automatically adds instances. After Black Friday, traffic drops to normal and instances scale down. You pay 10x more for 2 days; much cheaper than provisioning 100 instances year-round.
Scenario 2: Global SaaS with 99.99% SLA
You promise customers 99.99% uptime. Deploy in us-central1 and europe-west1 (two regions). Each region has a multi-zone MIG. Global load balancer routes requests to the closest region. If europe-west1 fails, all users route to us-central1 (latency increases but availability maintained). Cloud Monitoring alerts if error rate spikes. Your SLA is achievable.
Scenario 3: CI/CD Pipeline with Rolling Deployments
Developers push code; CI/CD unit tests pass. You want to deploy to production without downtime. Use MIG rolling update: 1. Update instance template with new code. 2. Start rolling update (1 instance at a time). 3. MIG terminates old instance, starts new one with new code. 4. Load balancer only sends traffic to healthy instances. Users never notice. If new code breaks, roll back by updating template to old code.
Summary
High availability and scaling are essential for production systems:
- Deploy across multiple zones for redundancy. Health checks and load balancers route traffic automatically.
- Use managed instance groups + auto-scaling to handle traffic spikes and reduce costs.
- Multi-region deployments serve global users with low latency.
- Rolling updates enable zero-downtime deployments.
- Monitor closely; automate failover. 99.99% uptime requires design, not luck.