AdvancedLesson 7 of 12

High Availability and Scaling

Design resilient, globally distributed systems that scale automatically to handle traffic spikes.

Simple Explanation (ELI5)

High availability means your app stays online even if parts fail. Scaling means handling more users/requests automatically. In GCP, you deploy apps across multiple zones (redundancy), use auto-scaling groups (add/remove instances based on demand), and global load balancers (route traffic worldwide). If one zone goes down, traffic shifts to another. If traffic spikes, new instances spin up automatically.

Why High Availability and Scaling Matter

Technical Explanation

1. Multi-Zone Deployment

Deploy identical infrastructure across 2+ zones in a region. If one zone fails, traffic redirects to others. Health checks (periodic pings) detect unhealthy instances and remove them automatically.

bash
# Create an instance template (blueprint)
gcloud compute instance-templates create my-template \
  --machine-type=n1-standard-1 \
  --image-family=debian-11 \
  --image-project=debian-cloud \
  --boot-disk-size=20GB \
  --metadata=startup-script='#!/bin/bash
apt-get update
apt-get install -y nginx
echo "Server: $(hostname -I)" | sudo tee /var/www/html/index.html'

# Create a managed instance group (spans multiple zones)
gcloud compute instance-groups managed create my-ig \
  --base-instance-name=instance \
  --template=my-template \
  --size=3 \
  --zone=us-central1-a

# Create a health check
gcloud compute health-checks create http my-health-check \
  --global \
  --port=80 \
  --request-path=/

# Attach health check to instance group
gcloud compute instance-groups managed set-health-checks my-ig \
  --health-checks=my-health-check \
  --global

2. Auto-Scaling

Automatically add or remove instances based on CPU, memory, or custom metrics. Keeps costs low and performance high.

bash
# Set up auto-scaling policy
gcloud compute instance-groups managed set-autoscaling my-ig \
  --max-num-replicas=10 \
  --min-num-replicas=2 \
  --target-cpu-utilization=0.7 \
  --zone=us-central1-a

# Scale policy: min 2 instances, max 10, scale up when CPU > 70%
# Scales down when CPU < 70% (cooldown period: 300 seconds)

3. Load Balancing

Distribute traffic across instances. GCP offers HTTP(S) Load Balancer (Layer 7, content-aware), Network Load Balancer (Layer 4, ultra-high throughput), and Internal Load Balancer (private traffic within VPC).

bash
# Create a backend service (target for load balancer)
gcloud compute backend-services create my-backend \
  --global \
  --protocol=HTTP \
  --health-checks=my-health-check \
  --port-name=http

# Add instance group to backend service
gcloud compute backend-services add-backend my-backend \
  --global \
  --instance-group=my-ig \
  --instance-group-zone=us-central1-a

# Create a URL map (routes requests to backend)
gcloud compute url-maps create my-lb \
  --default-service=my-backend

# Create an HTTP(S) proxy
gcloud compute target-http-proxies create my-proxy \
  --url-map=my-lb

# Create a forwarding rule (external IP, listens on port 80)
gcloud compute forwarding-rules create my-forwarding-rule \
  --global \
  --target-http-proxy=my-proxy \
  --address-region=global \
  --ports=80

4. Global Traffic Director

Route users to the geographically closest region with available capacity. Combines global load balancing, multi-region deployments, and automatic failover.

bash
# Deploy app in multiple regions (each with instance group + backend service)
# Region 1: us-central1
gcloud compute instance-groups managed create my-ig-us-central1 \
  --size=3 --zone=us-central1-a --template=my-template

# Region 2: europe-west1
gcloud compute instance-groups managed create my-ig-europe-west1 \
  --size=3 --zone=europe-west1-b --template=my-template

# Create global backend service
gcloud compute backend-services create my-global-backend \
  --global --protocol=HTTP --load-balancing-scheme=EXTERNAL \
  --global-address-to-service-endpoint \
  --enable-cdn

# Add both regions to global backend
gcloud compute backend-services add-backend my-global-backend --global \
  --instance-group=my-ig-us-central1 --instance-group-zone=us-central1-a
gcloud compute backend-services add-backend my-global-backend --global \
  --instance-group=my-ig-europe-west1 --instance-group-zone=europe-west1-b

5. High Availability Patterns

PatternDescriptionDowntimeComplexity
Single InstanceOne VM. Simple, cheap.Any failure = downtimeLow
Multi-Zone (Same Region)Multiple instances across zones in one region, LB across zones.Zone failure = seconds (failover)Medium
Multi-RegionFull deployment in multiple regions, global LB, DNS failover.Region failure = seconds (drift to next closest region)High
Active-ActiveAll regions serving traffic, global LB distributes based on proximity.Minimal; single instance failure = millisecondsVery High

6. Health Checks & Monitoring

Health checks ensure only healthy instances receive traffic. Custom monitoring (Stackdriver) alerts on anomalies.

bash
# Create a TCP health check
gcloud compute health-checks create tcp my-tcp-check \
  --port=3306

# Monitor load balancer traffic
gcloud compute backend-services get-health my-backend --global

# Create alerting policy (if CPU > 80% for 5 minutes)
gcloud alpha monitoring policies create \
  --notification-channels=CHANNEL_ID \
  --display-name="High CPU Alert" \
  --condition-display-name="CPU > 80%" \
  --condition-threshold-value=0.8 \
  --condition-threshold-duration=300s

GCP vs AWS vs Azure HA/Scaling

AspectGCPAWSAzure
Regional Multi-ZoneManaged Instance Groups + HTTP(S) LBAuto Scaling Groups + ALB/NLBVirtual Machine Scale Sets + Load Balancer
Global DistributionCloud CDN + Traffic DirectorCloudFront + ALB cross-regionFront Door + Traffic Manager
Auto-Scaling MetricsCPU, memory, custom (Stackdriver)CPU, memory, custom (CloudWatch)CPU, memory, custom (Monitor)
Ease of SetupSimple CLI; managed service feels easierMore configuration options; steeper learning curveSimilar to Azure VMs; VMSS can be complex
Cost (multi-region HA)Moderate; CDN includedHigher with ALB + cross-region data transferHigher with Traffic Manager + geo-redundant storage

Interview Questions

Beginner

What is high availability?

A system that continues to operate even if parts fail. Measured as uptime percentage (99.9% = 8.7 hours downtime/year, 99.99% = 52 minutes downtime/year). Achieved via redundancy and automatic failover.

What is the difference between vertical and horizontal scaling?

Vertical scaling: make instances bigger (more CPU, RAM). Horizontal scaling: add more instances. Horizontal is better for HA; if one instance fails, others handle traffic.

What is a load balancer?

A service that distributes traffic across multiple backend instances. Sends requests only to healthy instances. If an instance becomes unhealthy (fails health checks), the LB stops sending traffic to it.

Why do we use health checks?

Health checks ping instances periodically (e.g., every 5 seconds). If an instance doesn't respond, it's marked unhealthy and removed from the load balancer. Ensures traffic goes only to working instances.

Intermediate

What is a managed instance group and why is it better than manual instances?

A managed instance group (MIG) uses a template to automatically create identical instances. If an instance fails, MIG recreates it. MIG also enables auto-scaling and rolling updates. WAY better than managing five VMs manually.

How does auto-scaling work in GCP?

You define a scaling policy (min/max replicas, target CPU utilization). MIG constantly monitors metrics. If CPU > target, it adds instances. If CPU < target, it removes them (after cooldown period). Keeps performance high and costs low.

What is the difference between HTTP(S) and Network Load Balancer?

HTTP(S) LB (Layer 7) understands HTTP requests; can route by URL path or hostname. Network LB (Layer 4) routes TCP/UDP packets ultra-fast; used for gaming, IoT. HTTP(S) is most common; Network LB for extreme throughput.

How do you ensure zero-downtime deployments with managed instance groups?

Use rolling updates. Update the instance template, then apply a rolling update policy (e.g., max 1 instance at a time). MIG updates instances one by one; load balancer shifts traffic to healthy instances. Users never see downtime.

Scenario-based

Your web app traffic triples every Sunday. How do you handle it?

Use auto-scaling MIG with target CPU utilization (70%). Set max replicas to 20. On Sunday, CPU climbs, auto-scaling adds instances automatically. On Monday morning, traffic drops, instances are removed. You pay more on Sunday, but much less than running 20 instances all week.

One zone in your region has an outage. What happens?

If you deployed MIG across zones, instances in other zones continue serving. Load balancer routes traffic away from the broken zone. Failover is automatic (health checks detect failures). Users may see a tiny blip; no downtime.

You want to deploy a new version to 5% of users for testing. How?

Use Cloud Loadbalancer with traffic splitting. Set 5% of traffic to new instance template, 95% to old. Monitor error rates. If new version is good, increase to 50%, then 100%. If bugs found, traffic goes back to 0% new version. Zero-downtime canary deployment.

Your app is deployed in us-central1. A customer in Singapore reports slow performance. What do you do?

Deploy the same app in asia-southeast1 (Singapore). Set up global load balancer. Use Traffic Director to route Singapore users to asia-southeast1, US users to us-central1. Enable Cloud CDN to cache static assets. Latency drops from 150ms to 10ms.

Real-world Scenarios

Scenario 1: E-commerce Black Friday

Your online store normally runs 10 instances (100% CPU utilization). On Black Friday, traffic 10x; you need 100 instances to keep response time fast. Solution: Start with auto-scaling MIG (max 100 replicas). 2 days before Black Friday, monitor auto-scaling metrics. As traffic spikes, MIG automatically adds instances. After Black Friday, traffic drops to normal and instances scale down. You pay 10x more for 2 days; much cheaper than provisioning 100 instances year-round.

Scenario 2: Global SaaS with 99.99% SLA

You promise customers 99.99% uptime. Deploy in us-central1 and europe-west1 (two regions). Each region has a multi-zone MIG. Global load balancer routes requests to the closest region. If europe-west1 fails, all users route to us-central1 (latency increases but availability maintained). Cloud Monitoring alerts if error rate spikes. Your SLA is achievable.

Scenario 3: CI/CD Pipeline with Rolling Deployments

Developers push code; CI/CD unit tests pass. You want to deploy to production without downtime. Use MIG rolling update: 1. Update instance template with new code. 2. Start rolling update (1 instance at a time). 3. MIG terminates old instance, starts new one with new code. 4. Load balancer only sends traffic to healthy instances. Users never notice. If new code breaks, roll back by updating template to old code.

Summary

High availability and scaling are essential for production systems: